Web Scraping APIs

What Is a Headless Browser?

What Is a Headless Browser? — conceptual illustration
On this page

A headless browser is a real web browser — Chrome, Firefox, or WebKit — that runs without a visible graphical interface, controlled entirely through code. It loads pages, runs JavaScript, applies CSS, and fires events exactly like a normal browser, but instead of drawing pixels to a screen it exposes its state through an automation API. Scrapers, automated tests, and screenshot services all use headless browsers to interact with sites that depend on JavaScript to render content.

Quick facts

Common implementationsHeadless Chrome, Headless Firefox, WebKit (Safari engine)
Automation librariesPlaywright, Puppeteer, Selenium
Primary use casesScraping JS-heavy sites, automated testing, PDF/screenshot generation
Tradeoff vs. HTTP5–50x slower and more memory-hungry, but renders pages a request library can't

How headless browsers work

Under the hood, a headless browser is the same binary as the regular browser — Chromium ships a `--headless` flag and Firefox ships `-headless`. The browser starts a debugging protocol (Chrome DevTools Protocol for Chromium, WebDriver BiDi as the new cross-browser standard) and listens on a local port. Your automation library connects to that port and sends commands: navigate to a URL, wait for a selector, click this button, evaluate this JavaScript, return the HTML. The browser executes them inside its full rendering pipeline — same network stack, same V8/SpiderMonkey JavaScript engine, same DOM. The only thing missing is the window.

Why scrapers need headless browsers

Modern sites do not ship a complete HTML document anymore. They ship a thin shell and a JavaScript bundle that fetches data from internal APIs and renders the content client-side — React, Vue, Next.js, and Angular all work this way. A plain HTTP request gets you the shell, not the content. A headless browser executes the JavaScript, waits for the API calls to resolve, and gives you the final DOM. Headless browsers also handle the other moving parts of a real session: cookies, localStorage, redirects, form submissions, and WebSocket connections — all things a request library either doesn't do or doesn't do convincingly.

Headless browser detection

Anti-bot vendors actively look for headless browsers. The default Chrome headless mode leaks signals — the `navigator.webdriver` property is `true`, the User-Agent contains "HeadlessChrome", the window has no dimensions, plugins are missing, the WebGL renderer is generic. Stealth libraries (puppeteer-extra-plugin-stealth, playwright-stealth) patch the obvious ones; bot-detection vendors find new ones; the cat-and-mouse continues. Chrome's newer `--headless=new` mode closes most of the legacy leaks and is what serious scrapers use today. For very protected sites, even "new" headless is detectable — you need a real headful browser with display virtualization, which is what scraping APIs run internally.

When not to use a headless browser

Headless browsers are expensive — hundreds of MB of RAM per instance, seconds per page load. If the target site has an obvious internal API endpoint (open DevTools, Network tab, look for XHR/fetch calls returning JSON), call that directly with a plain HTTP request. If the site renders content server-side and ships complete HTML, a request library and an HTML parser will be 10–50x faster. Reach for the headless browser when the page genuinely needs JavaScript to render, when you need to interact with elements, or when the site fingerprints you before serving useful data.

Code example

python
from playwright.sync_api import sync_playwright

# Launch Chromium with no visible UI, render a JS-heavy page,
# then read the DOM after scripts have run.
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://example.com/dashboard')
    page.wait_for_selector('.loaded')
    html = page.content()
    browser.close()

Related terms

Concept map

How Headless Browser connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is headless Chrome the same as regular Chrome?

Same binary, same engine — just without the visible window. The newer `--headless=new` mode in Chrome runs the actual Chrome UI process in a way that's nearly indistinguishable from headful on most fingerprinting checks.

Playwright vs. Puppeteer vs. Selenium — which one?

Playwright is the modern default: multi-browser, multi-language, best DX. Puppeteer is Chrome-only but tightly integrated. Selenium is the legacy standard, still dominant in enterprise QA. For new scraping projects, pick Playwright.

Can headless browsers solve CAPTCHAs?

Not on their own — they render the challenge but can't see images or read invisible bot scores. You combine a headless browser with a CAPTCHA solver (or a scraping API that integrates both) to get past the challenge.

Do headless browsers respect robots.txt?

No — robots.txt is a convention for crawlers, not browsers. A headless browser will fetch whatever URL you tell it to. Respecting robots.txt is your responsibility to enforce in the code that drives the browser.

Last updated: 2026-05-28