What Is a Headless Browser?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is a Headless Browser? — conceptual illustration

On this page

A headless browser is a real web browser — Chrome, Firefox, or WebKit — that runs without a visible window, driven entirely by code instead of by a person clicking. It still loads pages, runs JavaScript, applies CSS, and fires events exactly like the browser on your screen — it just doesn't draw anything for a human to look at. Instead, it exposes what it's doing through an automation API your program can call. Scrapers, automated tests, and screenshot services all rely on headless browsers to work with sites that need JavaScript to build their content.

Common implementations	Headless Chrome, Headless Firefox, WebKit (Safari engine)
Automation libraries	Playwright, Puppeteer, Selenium
Primary use cases	Scraping JS-heavy sites, automated testing, PDF/screenshot generation
Tradeoff vs. HTTP	5–50x slower and more memory-hungry, but renders pages a request library can't

How headless browsers work

A headless browser is literally the same program as the normal one — you just start it differently. Chromium has a `--headless` flag and Firefox has `-headless`. When it starts, the browser opens a control channel (a debugging protocol — Chrome DevTools Protocol for Chromium, or WebDriver BiDi, the newer standard that works across browsers) and listens on a local port, like a phone line waiting for instructions. Your automation library dials that port and sends commands: go to this URL, wait until this element appears, click this button, run this JavaScript, hand back the HTML. The browser carries them out using its full rendering pipeline — the same network stack, the same JavaScript engine (V8 in Chrome, SpiderMonkey in Firefox), the same DOM (the in-memory tree of the page). The only thing missing is the window on screen.

Why scrapers need headless browsers

Most modern sites no longer send you a finished HTML page. They send a near-empty shell plus a JavaScript bundle that then fetches the real data from internal APIs and builds the page in your browser — this is how React, Vue, Next.js, and Angular all work. So a plain HTTP request hands you the empty shell, not the content you actually want. A headless browser runs that JavaScript, waits for the API calls to come back, and gives you the finished DOM. It also handles the other moving parts of a real visit — cookies, localStorage (small data the site saves in the browser), redirects, form submissions, and WebSocket connections (live two-way links) — all things a simple request library either can't do or can't fake convincingly.

Headless browser detection

Anti-bot vendors actively hunt for headless browsers, because the default headless mode gives itself away. Chrome's default headless leaks tells: the `navigator.webdriver` property reads `true` (a flag set whenever a browser is being automated), the User-Agent string contains "HeadlessChrome", the window reports no size, browser plugins are missing, and the WebGL renderer (the graphics-card name the browser reports) comes back generic instead of real-looking. Some libraries (puppeteer-extra-plugin-stealth, playwright-stealth) adjust these default values so an automated browser presents a configuration closer to a normal one; detection vendors track new signals; the dynamic continues to evolve. Chrome's newer `--headless=new` mode closes most of the old differences and is commonly used today. For some sites, a real headful browser with display virtualization (a fake on-screen display so the browser behaves as if a monitor is attached) produces more consistent behavior, which is what scraping APIs run behind the scenes.

When not to use a headless browser

Headless browsers are expensive to run — hundreds of MB of RAM each, and seconds per page load. If the target site has an obvious internal API endpoint (open DevTools, go to the Network tab, and look for XHR/fetch calls that return JSON), just call that directly with a plain HTTP request. If the site renders its pages on the server and ships complete HTML, a request library plus an HTML parser will be 10–50x faster. Save the headless browser for when the page truly needs JavaScript to render, when you need to interact with elements like buttons or forms, or when the site fingerprints you before it will hand over anything useful.

Code example

python

from playwright.sync_api import sync_playwright

# Launch Chromium with no visible UI, render a JS-heavy page,
# then read the DOM after scripts have run.
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://example.com/dashboard')
    page.wait_for_selector('.loaded')
    html = page.content()
    browser.close()

Related terms

What Is Web Scraping?

Web scraping is the automated extraction of structured data from websites. Instead of a person copying and pasting, a program (a "scraper") …

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

What Is Anti-Bot Detection?

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…

What Is a Computer Use Agent?

A Computer Use Agent (CUA) is an AI agent that acts like a person at a keyboard: it logs into a portal as the user, clicks through the scree…

What Is the Chrome DevTools Protocol (CDP)?

The Chrome DevTools Protocol (CDP) is the low-level interface for instrumenting and controlling Chromium-based browsers. Low-level means it …

What Is JavaScript Rendering?

JavaScript rendering is the process of executing a page's JavaScript in a real browser engine so that content built on the client side appea…

Playwright vs Puppeteer

Playwright and Puppeteer are both Node-based browser automation libraries that drive a real browser over the Chrome DevTools Protocol (CDP),…

Playwright vs Selenium Compared

Playwright and Selenium are both browser-automation libraries that drive real browsers for testing and scraping, but they differ in architec…

Scrapy vs Playwright: When to Use Each

Scrapy and Playwright solve different halves of web scraping: Scrapy is an asynchronous crawl framework that fetches and parses HTML over pl…

How to Scrape JavaScript-Heavy Websites

JavaScript-heavy websites build their content in the browser after the first response, so a plain HTTP request returns an almost-empty HTML …

Concept map

How Headless Browser connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

Is headless Chrome the same as regular Chrome?

Yes — same program, same engine, just without the visible window. The newer `--headless=new` mode runs the actual Chrome UI process in a way that's nearly indistinguishable from regular (headful) Chrome on most fingerprinting checks.

Playwright vs. Puppeteer vs. Selenium — which one?

Playwright is the modern default: it drives multiple browsers, works in several languages, and has the best developer experience. Puppeteer is Chrome-only but tightly integrated. Selenium is the older standard, still dominant in enterprise QA teams. For a new scraping project, pick Playwright.

Can headless browsers solve CAPTCHAs?

Not by themselves — a headless browser can display the challenge, but it can't recognize images or read the invisible bot scores a CAPTCHA uses to judge you. On sites you are permitted to access, the durable approach is to reduce how often a challenge appears in the first place — a coherent fingerprint and quality residential IPs — rather than relying on the headless browser to clear it.

Do headless browsers respect robots.txt?

No. robots.txt is a polite convention aimed at crawlers, not browsers, so a headless browser will fetch any URL you point it at. Honoring robots.txt is up to you to build into the code that drives the browser.

Last updated: 2026-05-31