How Browser Automation Engines Are Benchmarked

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How Browser Automation Engines Are Benchmarked — conceptual illustration

On this page

A browser-automation-engine benchmark drives several automation stacks through the same set of targets and records, side by side, how often each one reaches real page content, how much memory and CPU it burns, and how its fingerprint scores. Instead of arguing which engine is "most human", a benchmark runs every engine - real Playwright, patched builds like Patchright, Firefox-based Camoufox, Selenium/SeleniumBase, CDP drivers, and anti-detect browsers - against the same checklist and reports the numbers. The open-source techinz/browsers-benchmark suite (Python 3.8+, MIT) is a good reference implementation, and the tables below are taken from its published example run.

What it scores	Detection-pass rate, memory + CPU, reCAPTCHA v3 score, CreepJS trust, IP/WebRTC leak
Engines covered	23 configs: Playwright, Patchright, Camoufox, Selenium/SeleniumBase, NoDriver/ZenDriver, anti-detect browsers
Targets probed	A spread of modern protection stacks plus major retail, search, and social sites
Biggest confound	IP reputation - a flagged IP sinks even a perfect fingerprint, so a clean proxy per engine is required
Source	github.com/techinz/browsers-benchmark (Python, MIT) - pluggable targets + engines

The four families of metrics

A meaningful benchmark separates capability from cost and measures both on the same run. The metrics group into four families:

Detection-pass rate - the share of targets where the engine reached the real page instead of an interstitial or verification screen. Targets span a range of modern protection stacks, so the rate is a coarse measure of how "real" the session looks end to end.
Resource cost - peak memory (MB) and CPU (%) per engine. This is where the differences are largest: a lightweight anti-detect profile can run at ~120 MB while a full headful Chromium or Firefox session uses 1,000-1,400 MB. At fleet scale this decides how many sessions fit on a box.
Fingerprint scores - a reCAPTCHA v3 score (0 = bot, 1 = human) read from a public scorer, and a CreepJS trust/bot reading. These probe how coherent the browser fingerprint looks to a real detector rather than to a single boolean check.
Network hygiene - whether the IP seen by the site is the proxy IP (good) or the real IP (bad), and whether WebRTC leaks a different address. A stealthy fingerprint over a leaking connection is still caught.

Sample results from one benchmark run

These tables are from the suite's published example run (full data, charts, and screenshots are in the repo under results/example). Read them as broad tiers, not a precise leaderboard - a single run is noisy, and each engine used a different clean proxy. The repo labels its first column "bypass rate" — really an access rate: the share of target sites where the engine reached real content.

Detection-pass rate (higher is better):

Engine	Pass rate (%)
patchright	100.0
cloakbrowser	90.0
camoufox_headless	90.0
nodriver-chrome	80.0
adspower	80.0
seleniumbase-cdp-chrome	80.0
adspower_headless	70.0
tf-playwright-stealth-firefox	70.0
tf-playwright-stealth-firefox_headless	70.0
zendriver-chrome	70.0
cloakbrowser_headless	60.0
tf-playwright-stealth-chromium	60.0
playwright-chrome	60.0
playwright-firefox_headless	60.0
playwright-firefox	60.0
zendriver-chrome_headless	60.0
tf-playwright-stealth-chromium_headless	50.0
selenium-chrome (no proxy)	50.0
playwright-chrome_headless	40.0
nodriver-chrome_headless	40.0
patchright_headless	40.0
camoufox	30.0
selenium-chrome_headless (no proxy)	30.0

Resource cost - peak memory and CPU per engine (lower is better). Note how the lightest options (the anti-detect profiles) and the best-scoring options sit at opposite ends, so the "winner" depends on whether you optimise for pass rate or for sessions-per-server:

Engine	Memory (MB)	CPU (%)
adspower_headless	123	4.9
adspower	130	7.0
playwright-chrome_headless	517	7.4
tf-playwright-stealth-chromium_headless	522	8.2
zendriver-chrome_headless	818	21.5
tf-playwright-stealth-chromium	913	28.6
cloakbrowser_headless	936	27.2
selenium-chrome_headless (no proxy)	948	10.1
zendriver-chrome	1009	38.8
selenium-chrome (no proxy)	1034	29.8
tf-playwright-stealth-firefox_headless	1034	44.7
playwright-firefox_headless	1068	70.3
cloakbrowser	1082	54.4
playwright-chrome	1103	28.5
seleniumbase-cdp-chrome	1157	51.9
playwright-firefox	1161	75.5
camoufox	1181	53.3
tf-playwright-stealth-firefox	1261	67.7
nodriver-chrome_headless	1275	24.4
patchright_headless	1277	16.6
patchright	1314	53.3
camoufox_headless	1318	88.6
nodriver-chrome	1389	47.4

Fingerprint scores. In this run almost every working engine scored ~0.90 on reCAPTCHA v3 - a reminder that one score separates the obviously-broken from the rest, not the good from the great. A few configs returned no score because the scorer page stopped responding mid-test. CreepJS trust/bot percentages all read 0.00 here because CreepJS upstream temporarily disabled those scores, so in practice the suite now leans on CreepJS mainly for the WebRTC-leak check: if the WebRTC IP differs from the real IP, the proxy is not leaking.

What the numbers tend to show

Across published runs a few patterns repeat. Headful beats headless on the same engine almost every time - the headless variant of an engine routinely lands 20-40 points lower on detection-pass rate, because headless-specific tells leak through (note in the table how camoufox headful scores 30 but camoufox_headless scores 90, a reminder that per-engine tuning, not just the mode, drives the result). Patched and engine-level stealth lead: Patchright (a CDP-patched Playwright that avoids the Runtime.enable tell) and Camoufox (which sets fingerprint values from inside the Firefox engine rather than via injected JavaScript) tend to top the table, while plain Playwright and plain Selenium sit lower. Stealth costs resources: the engines near the top on detection are often the heaviest on CPU, so optimise for the axis you care about. Anti-detect browsers trade off differently - a managed anti-detect profile can be by far the lightest option (~120-130 MB) while still scoring mid-pack on detection.

Running and extending the suite

The suite is a Python project with a modular layout (config/, engines/, utils/targets/, utils/report/) so both what it tests and which engines it runs are pluggable. The essentials:

Setup - create a venv, pip install -r requirements.txt, then install the browser engines you want: playwright install, camoufox fetch, patchright install chromium. Anti-detect browsers that need a local desktop app and API key are optional.
Proxies are required, one per engine - list them in documents/proxies.txt, at least as many as the engines you test. Protocols matter: Playwright takes HTTP/HTTPS only, NoDriver takes SOCKS5 only, and Selenium runs without a proxy - so you need a mix. The benchmark reports which protocols are missing.
Run - python main.py produces summary.md, benchmark_results.json, and a media/ folder of dashboard charts and per-target screenshots.

Adding a target is a Target(...) definition plus a check function that returns True when real content rendered (see the code example below). Adding an engine means subclassing the right base and registering it:

// engines/ - subclass the base that matches your stack
class CustomEngine(BrowserEngine): ...          # from scratch
class CustomEngine(PlaywrightBase): ...          # Playwright-based
class CustomEngine(SeleniumBase): ...            # Selenium-based

// then register it in config/engines.py
base_engines = [
  { "class": CustomEngine,
    "params": { "headless": True, "name": "custom_engine", "browser_type": "chromium" } },
]

That extensibility is the practical value of the project: it is less a fixed leaderboard than a harness you point at your own targets and engines.

Reading the results without fooling yourself

The single most important caveat in any benchmark like this is that IP reputation usually matters more than the engine. The suite requires a clean proxy precisely because a home or datacenter IP that has been flagged by prior automation will fail targets regardless of how good the fingerprint is - so you would be measuring your IP, not the engine. Results are also noisy per run (a target can be down, rate-limiting can kick in, a fingerprint scorer can change), which is why these tables are best read as broad tiers, not leaderboards to two decimal places.

The deeper lesson is the same one that runs through fingerprinting in general: passing depends on coherence, not on any one trick. The engines that win are the ones whose TLS handshake, headers, fingerprint surfaces, and exit IP all tell one consistent story - not the ones that spoof a single field hardest. That coherence is also why teams running at scale often move the hard parts server-side: a managed web-data API such as Scrappey handles fingerprinting, residential routing, and TLS matching behind one request, so the coherence is maintained for you as detection evolves - while a self-hosted engine from a benchmark like this remains the right pick for learning, testing, and full control.

Code example

python

# A browser-engine benchmark drives every engine through the SAME targets
# and records pass/fail, resource use, and fingerprint scores per engine.
# Targets are pluggable - a check function returns True when real content rendered.
from engines.base import BrowserEngine

async def check_render(engine: BrowserEngine) -> bool:
    # True  -> the engine reached the real page
    # False -> it was held at a verification / interstitial screen
    blocked, _html = await engine.locator('//div[@id="challenge"]')
    return not blocked

# Each engine is run headless AND headful, because the headless variant
# usually scores lower on the same detection and fingerprint checks.
# Every engine gets its own clean proxy so you measure the ENGINE,
# not the reputation of one shared IP.

Related terms

Anti-Detect Browser Tools Compared

Anti-detect browser tools aim to present a consistent, real-looking browser configuration so that automated sessions render the same fingerp…

What Is Camoufox?

Camoufox is a fork of Firefox with anti-fingerprinting patches applied at the C++ build level. That phrase matters: most anti-fingerprinting…

What Is PatchRight?

PatchRight is a browser-automation library that edits Playwright's own Python code before Chrome launches, instead of injecting JavaScript i…

What Is Playwright?

Playwright is a cross-browser automation framework from Microsoft that drives Chromium, Firefox, and WebKit through a single API. An automat…

What Is SeleniumBase?

SeleniumBase is a Python framework for automating and testing browsers, built on top of Selenium 4. Its two notable features, UC Mode and CD…

What Is Headless Browser Detection?

Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

What Is a WebRTC IP Leak?

A WebRTC IP leak is when your browser quietly reveals your real IP address — even though you set up a proxy to hide it. It is the most-overl…

What Is a Residential Proxy?

A residential proxy sends your web traffic through a real home internet connection — a regular broadband or fiber line — instead of through …

What Is Fingerprint Clustering?

Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…

How Do You Choose an Anti-Detect Browser Tool?

Choosing an anti-detect browser tool comes down to matching the tool's strengths to the detection layer you actually face - no single tool i…

What Is CreepJS?

CreepJS is an open-source JavaScript test page that measures how identifiable a browser is and whether the values it reports are internally …

What Is FingerprintJS?

FingerprintJS is an open-source JavaScript library that combines many browser and device attributes into a single hashed visitor identifier.…

How Is Browser Stealth Benchmarked?

Benchmarking browser stealth means measuring how detectable an automated browser session is, rather than how fast or efficient the automatio…

What Is undetected-chromedriver?

undetected-chromedriver is an open-source Python library that provides a patched version of Selenium's ChromeDriver. It is a near drop-in re…

What Is nodriver?

nodriver is an open-source, asynchronous Python library that drives Chrome directly over the Chrome DevTools Protocol (CDP), with no Seleniu…

Concept map

How Browser Automation Engine Benchmarks connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

What does a browser automation benchmark actually measure?

Four things at once, per engine: how often it reaches real page content across a set of targets (detection-pass rate), how much memory and CPU it uses, how its fingerprint scores on probes like reCAPTCHA v3 and CreepJS, and whether the network connection leaks the real IP via the proxy check or WebRTC. Measuring capability and cost together is the point - the best stealth engine is often the most resource-hungry.

Why do headless browsers score worse in these benchmarks?

Because headless mode leaves detectable tells - missing or default window/screen metrics, rendering differences, and other headless-specific signals - that a real detector reads. On the same engine, the headless variant typically lands well below the headful one on detection-pass rate, which is why benchmarks run both.

Why does the benchmark insist on a clean proxy for every engine?

Because IP reputation usually outweighs the engine. A home or datacenter IP already flagged by past automation will fail targets no matter how good the fingerprint is, so you would be measuring the IP rather than the engine. Giving each engine its own clean proxy isolates the variable you actually want to compare.

Last updated: 2026-06-04