Where Playwright fits in scraping
Playwright is the right tool when the data is rendered client-side - built by JavaScript in the browser - so a plain HTTP client can't reach it: single-page apps that fetch via XHR (background requests) after the first paint, infinite-scroll lists, OAuth login flows, anything that requires real DOM events like clicks. The trade-off is weight: it runs ~200MB of RAM per browser context - far heavier than a lightweight HTTP client like curl_cffi - so use it only when the lighter approach doesn't work.
The Python API is the most common in scraping. async_playwright integrates with asyncio (Python's async system) cleanly, and scrapy-playwright wraps it as a Scrapy downloader middleware, so a crawl uses a real browser only on the specific pages that need one. The Node.js version is the original and slightly ahead on features, but the Python one is feature-stable enough to match.
Why default Playwright gets blocked
Vanilla (unmodified) Playwright is detected on multiple surfaces at once - each one a separate giveaway:
navigator.webdriver === true— the most-checked flag; it openly announces "a browser is being automated" and is set by Playwright and Selenium alike.- CDP connection signal — the channel Playwright uses to control Chrome leaves traces; anti-bot scripts probe for
window.cdc_properties and Runtime.evaluate timing artifacts. - Headless mode tells — running without a visible window leaves gaps a real browser wouldn't have: missing chrome.runtime, missing plugins, a languages array of length 1, no permissions API.
- Function.toString() inspection — a site can ask a browser function to print its own source; any stealth plugin that patches methods at the JS level fails this check (see the toString inspection entry).
- Default Playwright User-Agent includes "HeadlessChrome" unless explicitly overridden, which flags the request instantly.
Setting headless: false and overriding the User-Agent removes the cheapest signals, but the CDP signal and toString inspection still fire. Presenting a consistent fingerprint in production generally requires a patched fork rather than runtime configuration.
Playwright vs Puppeteer vs Selenium
Picking between the three:
- Playwright — multi-browser, multi-language, modern auto-wait API. Default choice for new scrapers in Python or Node. Fastest learning curve.
- Puppeteer — Node-only, Chromium-only. Smaller API surface, mature ecosystem, slightly faster startup. Pick if you're Node-only and don't need Firefox/WebKit.
- Selenium — widest browser support (Safari, Edge, even mobile WebDriver), oldest API. Pick if you need Safari testing or have an existing Selenium codebase. Most detectable of the three.
All three are equally easy to detect on a default install. Patched variants exist for Playwright/Puppeteer (Camoufox, PatchRight, undetected-chromedriver, SeleniumBase UC), so the stealth ecosystem is the practical tiebreaker.
