The four families of metrics
A meaningful benchmark separates capability from cost and measures both on the same run. The metrics group into four families:
- Detection-pass rate - the share of targets where the engine reached the real page instead of an interstitial or verification screen. Targets span a range of modern protection stacks, so the rate is a coarse measure of how "real" the session looks end to end.
- Resource cost - peak memory (MB) and CPU (%) per engine. This is where the differences are largest: a lightweight anti-detect profile can run at ~120 MB while a full headful Chromium or Firefox session uses 1,000-1,400 MB. At fleet scale this decides how many sessions fit on a box.
- Fingerprint scores - a reCAPTCHA v3 score (0 = bot, 1 = human) read from a public scorer, and a CreepJS trust/bot reading. These probe how coherent the browser fingerprint looks to a real detector rather than to a single boolean check.
- Network hygiene - whether the IP seen by the site is the proxy IP (good) or the real IP (bad), and whether WebRTC leaks a different address. A stealthy fingerprint over a leaking connection is still caught.
Sample results from one benchmark run
These tables are from the suite's published example run (full data, charts, and screenshots are in the repo under results/example). Read them as broad tiers, not a precise leaderboard - a single run is noisy, and each engine used a different clean proxy. The repo labels the first column "bypass rate"; it is the share of target sites where the engine reached real content.
Detection-pass rate (higher is better):
| Engine | Pass rate (%) |
|---|---|
| patchright | 100.0 |
| cloakbrowser | 90.0 |
| camoufox_headless | 90.0 |
| nodriver-chrome | 80.0 |
| adspower | 80.0 |
| seleniumbase-cdp-chrome | 80.0 |
| adspower_headless | 70.0 |
| tf-playwright-stealth-firefox | 70.0 |
| tf-playwright-stealth-firefox_headless | 70.0 |
| zendriver-chrome | 70.0 |
| cloakbrowser_headless | 60.0 |
| tf-playwright-stealth-chromium | 60.0 |
| playwright-chrome | 60.0 |
| playwright-firefox_headless | 60.0 |
| playwright-firefox | 60.0 |
| zendriver-chrome_headless | 60.0 |
| tf-playwright-stealth-chromium_headless | 50.0 |
| selenium-chrome (no proxy) | 50.0 |
| playwright-chrome_headless | 40.0 |
| nodriver-chrome_headless | 40.0 |
| patchright_headless | 40.0 |
| camoufox | 30.0 |
| selenium-chrome_headless (no proxy) | 30.0 |
Resource cost - peak memory and CPU per engine (lower is better). Note how the lightest options (the anti-detect profiles) and the best-scoring options sit at opposite ends, so the "winner" depends on whether you optimise for pass rate or for sessions-per-server:
| Engine | Memory (MB) | CPU (%) |
|---|---|---|
| adspower_headless | 123 | 4.9 |
| adspower | 130 | 7.0 |
| playwright-chrome_headless | 517 | 7.4 |
| tf-playwright-stealth-chromium_headless | 522 | 8.2 |
| zendriver-chrome_headless | 818 | 21.5 |
| tf-playwright-stealth-chromium | 913 | 28.6 |
| cloakbrowser_headless | 936 | 27.2 |
| selenium-chrome_headless (no proxy) | 948 | 10.1 |
| zendriver-chrome | 1009 | 38.8 |
| selenium-chrome (no proxy) | 1034 | 29.8 |
| tf-playwright-stealth-firefox_headless | 1034 | 44.7 |
| playwright-firefox_headless | 1068 | 70.3 |
| cloakbrowser | 1082 | 54.4 |
| playwright-chrome | 1103 | 28.5 |
| seleniumbase-cdp-chrome | 1157 | 51.9 |
| playwright-firefox | 1161 | 75.5 |
| camoufox | 1181 | 53.3 |
| tf-playwright-stealth-firefox | 1261 | 67.7 |
| nodriver-chrome_headless | 1275 | 24.4 |
| patchright_headless | 1277 | 16.6 |
| patchright | 1314 | 53.3 |
| camoufox_headless | 1318 | 88.6 |
| nodriver-chrome | 1389 | 47.4 |
Fingerprint scores. In this run almost every working engine scored ~0.90 on reCAPTCHA v3 - a reminder that one score separates the obviously-broken from the rest, not the good from the great. A few configs returned no score because the scorer page stopped responding mid-test. CreepJS trust/bot percentages all read 0.00 here because CreepJS upstream temporarily disabled those scores, so in practice the suite now leans on CreepJS mainly for the WebRTC-leak check: if the WebRTC IP differs from the real IP, the proxy is not leaking.
What the numbers tend to show
Across published runs a few patterns repeat. Headful beats headless on the same engine almost every time - the headless variant of an engine routinely lands 20-40 points lower on detection-pass rate, because headless-specific tells leak through (note in the table how camoufox headful scores 30 but camoufox_headless scores 90, a reminder that per-engine tuning, not just the mode, drives the result). Patched and engine-level stealth lead: Patchright (a CDP-patched Playwright that avoids the Runtime.enable tell) and Camoufox (which sets fingerprint values from inside the Firefox engine rather than via injected JavaScript) tend to top the table, while plain Playwright and plain Selenium sit lower. Stealth costs resources: the engines near the top on detection are often the heaviest on CPU, so optimise for the axis you care about. Anti-detect browsers trade off differently - a managed anti-detect profile can be by far the lightest option (~120-130 MB) while still scoring mid-pack on detection.
Running and extending the suite
The suite is a Python project with a modular layout (config/, engines/, utils/targets/, utils/report/) so both what it tests and which engines it runs are pluggable. The essentials:
- Setup - create a venv,
pip install -r requirements.txt, then install the browser engines you want:playwright install,camoufox fetch,patchright install chromium. Anti-detect browsers that need a local desktop app and API key are optional. - Proxies are required, one per engine - list them in
documents/proxies.txt, at least as many as the engines you test. Protocols matter: Playwright takes HTTP/HTTPS only, NoDriver takes SOCKS5 only, and Selenium runs without a proxy - so you need a mix. The benchmark reports which protocols are missing. - Run -
python main.pyproducessummary.md,benchmark_results.json, and amedia/folder of dashboard charts and per-target screenshots.
Adding a target is a Target(...) definition plus a check function that returns True when real content rendered (see the code example below). Adding an engine means subclassing the right base and registering it:
// engines/ - subclass the base that matches your stack
class CustomEngine(BrowserEngine): ... # from scratch
class CustomEngine(PlaywrightBase): ... # Playwright-based
class CustomEngine(SeleniumBase): ... # Selenium-based
// then register it in config/engines.py
base_engines = [
{ "class": CustomEngine,
"params": { "headless": True, "name": "custom_engine", "browser_type": "chromium" } },
]That extensibility is the practical value of the project: it is less a fixed leaderboard than a harness you point at your own targets and engines.
Reading the results without fooling yourself
The single most important caveat in any benchmark like this is that IP reputation usually matters more than the engine. The suite requires a clean proxy precisely because a home or datacenter IP that has been flagged by prior automation will fail targets regardless of how good the fingerprint is - so you would be measuring your IP, not the engine. Results are also noisy per run (a target can be down, rate-limiting can kick in, a fingerprint scorer can change), which is why these tables are best read as broad tiers, not leaderboards to two decimal places.
The deeper lesson is the same one that runs through fingerprinting in general: passing depends on coherence, not on any one trick. The engines that win are the ones whose TLS handshake, headers, fingerprint surfaces, and exit IP all tell one consistent story - not the ones that spoof a single field hardest. That coherence is also why teams running at scale often move the hard parts server-side: a managed web-data API such as Scrappey handles fingerprinting, residential routing, verification-workflow handling, and TLS matching behind one request, so the coherence is maintained for you as detection evolves - while a self-hosted engine from a benchmark like this remains the right pick for learning, testing, and full control.