Web Scraping APIs

What Is Playwright?

What Is Playwright? — conceptual illustration
On this page

Playwright is a cross-browser automation framework from Microsoft that drives Chromium, Firefox, and WebKit through a single API. Released in 2020 as a Puppeteer successor, it added auto-waiting, parallel browser contexts, and first-class support for Python, .NET, and Java alongside Node.js. In scraping it is the default browser-automation choice when JavaScript execution is required — but it ships with default fingerprints that anti-bot vendors detect immediately, so production scrapers run a patched variant (Camoufox, PatchRight, CloakBrowser) rather than vanilla Playwright.

Quick facts

VendorMicrosoft (open-source, Apache 2.0)
LanguagesPython, Node.js / TypeScript, .NET, Java
BrowsersChromium, Firefox, WebKit (via patched binaries it ships)
ProtocolChrome DevTools Protocol (CDP) for Chromium; bidirectional WebSocket
Default detectionBlock-grade on Akamai, Kasada, Cloudflare BM out of the box

Where Playwright fits in scraping

Playwright is the right tool when the data is rendered client-side and an HTTP client can't reach it: single-page apps that fetch via XHR after first paint, infinite-scroll lists, OAuth login flows, anything that requires real DOM events. It runs ~200MB of RAM per browser context — far heavier than curl_cffi — so use it only when the lighter approach doesn't work.

The Python API is the most common in scraping. async_playwright integrates with asyncio cleanly, and scrapy-playwright wraps it as a Scrapy downloader middleware for crawls that need browser rendering only on specific pages. The Node.js version is the original and slightly ahead on features but the Python one is feature-stable enough to match.

Why default Playwright gets blocked

Vanilla Playwright is detected on multiple surfaces simultaneously:

  • navigator.webdriver === true — the most-checked flag, set by Playwright and Selenium alike.
  • CDP connection signal — anti-bot scripts probe for window.cdc_ properties and Runtime.evaluate timing artifacts.
  • Headless mode tells — missing chrome.runtime, missing plugins, languages array of length 1, no permissions API.
  • Function.toString() inspection — any stealth plugin that patches methods at the JS level fails this check (see the toString inspection entry).
  • Default Playwright User-Agent includes "HeadlessChrome" unless explicitly overridden.

Setting headless: false and overriding the User-Agent removes the cheap detections but the CDP signal and toString inspection still fire. Production stealth requires a patched fork rather than runtime configuration.

Playwright vs Puppeteer vs Selenium

Picking between the three:

  • Playwright — multi-browser, multi-language, modern auto-wait API. Default choice for new scrapers in Python or Node. Fastest learning curve.
  • Puppeteer — Node-only, Chromium-only. Smaller API surface, mature ecosystem, slightly faster startup. Pick if you're Node-only and don't need Firefox/WebKit.
  • Selenium — widest browser support (Safari, Edge, even mobile WebDriver), oldest API. Pick if you need Safari testing or have an existing Selenium codebase. Most detectable of the three.

All three are equally easy to detect on a default install. The patched variants exist for Playwright/Puppeteer (Camoufox, PatchRight, undetected-chromedriver, SeleniumBase UC), so the stealth ecosystem is the practical tiebreaker.

Code example

python
# Async Playwright with a residential proxy and useragent override
from playwright.async_api import async_playwright

async def scrape(url, proxy_url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=False,                          # don't advertise headless
            proxy={"server": proxy_url},
        )
        ctx = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/131.0.0.0 Safari/537.36",
            locale="en-US",
            viewport={"width": 1920, "height": 1080},
        )
        page = await ctx.new_page()
        await page.goto(url, wait_until="domcontentloaded")
        await page.wait_for_timeout(2000)             # let XHRs settle
        html = await page.content()
        await browser.close()
        return html
# This passes simple sites; loses against Akamai/Kasada — use Camoufox or PatchRight instead.

Related terms

Concept map

How Playwright connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is Playwright better than Puppeteer for scraping?

For Node-only Chromium scraping they're interchangeable — pick by team familiarity. Playwright wins if you need Python or Firefox/WebKit. Both lose to anti-bot on default settings and both have patched variants that fix it.

Can Playwright bypass Cloudflare on its own?

Free-tier Cloudflare and Bot Fight Mode, yes, with a residential proxy. Cloudflare Bot Management Enterprise, no — the JA4 + CDP signals are flagged. Switch to Camoufox or use a managed API.

Why use scrapy-playwright instead of just Playwright?

When the crawl is bigger than ~1000 URLs and you want Scrapy's queue, retries, deduplication, and item pipelines, but only some pages need a browser. scrapy-playwright lets you mark specific requests as needing a browser; others go through the cheap HTTP path.

Last updated: 2026-05-27