PathFingerprint testing tools

Step 5 of 6

How Is Browser Stealth Benchmarked?

Pim · Scrappey Research

June 15, 2026 3 min read

Paste into ChatGPT, Claude, or any LLM

How Is Browser Stealth Benchmarked? - conceptual illustration

On this page

Benchmarking browser stealth means measuring how detectable an automated browser session is, rather than how fast or efficient the automation engine is. Two kinds of verdict are involved. A live or real-world verdict is an actual decision by a deployed detector or human - allowed, challenged, or blocked - measured against genuine anti-bot systems. An inferred verdict is an estimate of detectability from a fingerprint-only model, with no real gatekeeper in the loop. The two can disagree, which is the whole point of distinguishing them: passing a fingerprint test page is not the same as passing a production detector.

Live verdict	A real detector or human decides allow / challenge / block
Inferred verdict	A fingerprint-only model predicts detectability
Why they diverge	Live detectors also weigh IP, TLS, HTTP/2, session history
Not the same as	Engine benchmarks (memory / CPU / speed / pass-rate)
Validity	Snapshots - single author, dated, version-volatile tools

Live verdict vs. inferred verdict

A fingerprint test page - the CreepJS / BrowserScan family - produces an inferred verdict: it inspects what the browser reports and predicts how identifiable it is, but nothing is actually deciding to let you through. A live benchmark instead drives a tool against real targets behind real anti-bot gates and records what each gate does - allowed, served a challenge, or blocked. Because the live test measures the decision a production system makes, it captures factors a client-side page never sees, while the inferred test is repeatable and cheap but blind to the network and server side. Good stealth benchmarking is explicit about which of the two it is reporting.

Why identical fingerprints get opposite verdicts

The same client fingerprint can pass on one run and be blocked on another because production detectors weigh signals the fingerprint does not contain: IP reputation (a residential address versus a datacenter ASN), the TLS handshake fingerprint, HTTP/2 frame ordering, the shape of the automation protocol driving the browser, and session or behavioural history. Gates cross-check these layers for agreement - a Linux server advertising a desktop browser from a residential proxy is a contradiction a gate can flag - so an identical browser fingerprint passes behind one network context and fails behind another. That is why an inferred score and a live verdict are not interchangeable, and why calibration between them is an open question rather than a fixed number.

Why benchmarks are snapshots

Public stealth benchmarks are best read as dated snapshots, not standings. They are typically one author's run, on one operating system and often a single IP, against targets that swap anti-bot vendors without notice and tools that change fast - many are pre-alpha and shift behaviour between versions. Rotating the proxy can reorder the results; a browser-version difference can be mistaken for a tool difference. This is distinct from a browser-automation engine benchmark, which compares tools on memory, CPU, and speed; stealth benchmarking measures detectability itself. Use either as a directional reference and re-test for your own targets, IPs, and dates rather than quoting a leaderboard as settled fact.

Code example

text

Two questions a stealth benchmark can answer - keep them separate:

  INFERRED verdict (fingerprint-only)
    - load a test page (CreepJS / BrowserScan)
    - read its predicted detectability score
    - cheap, repeatable, but blind to IP / TLS / session

  LIVE verdict (real-world)
    - drive the tool against real targets behind real gates
    - record per target: allowed / challenged / blocked
    - captures server-side signals, but is a dated snapshot

The two can disagree: an identical fingerprint may pass behind a residential
IP and be blocked behind a datacenter one. Re-test for your own targets,
IPs, and date - do not treat any single run as a fixed ranking.

Next in Fingerprint testing tools · 6 of 6

And what any of these scores really tell you.

How to Read a Fingerprint Test Score

Related terms

What Is CreepJS?

CreepJS is an open-source JavaScript test page that measures how identifiable a browser is and whether the values it reports are internally …

What Is BrowserScan?

BrowserScan (browserscan.net) is a free, hosted web page that tests how authentic and internally consistent a browser's fingerprint looks, a…

Browser Automation Engine Benchmarks

A browser-automation-engine benchmark drives several automation stacks through the same set of targets and records, side by side, how often …

What Is TLS Fingerprinting (JA3/JA4)?

TLS fingerprinting is a way to recognize what software made a connection just by looking at how it sets up encryption — before the server re…

What Is Anti-Bot Detection?

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…

What Is Fingerprint Reconnect Stability?

Fingerprint reconnect stability is whether a browser returns a consistent fingerprint across reloads, reconnects, and sessions. A genuine de…

Concept map

How How Is Browser Stealth Benchmarked connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Anti-Bot

Tools & solutions for this topic

Frequently asked questions

What does a browser stealth benchmark measure?

It measures how detectable an automated browser session is to anti-bot systems, rather than how fast or resource-efficient the automation engine is. It is about the detectability signal itself.

What is the difference between a live and an inferred verdict?

A live verdict is an actual decision by a real detector or human - allowed, challenged, or blocked - against genuine anti-bot systems. An inferred verdict is a prediction from a fingerprint-only test page, with no real gatekeeper. The two can disagree.

Why can the same fingerprint pass once and fail once?

Because production detectors also weigh server-side signals the fingerprint does not include: IP reputation, TLS and HTTP/2 fingerprints, the automation protocol shape, and session history. The same browser fingerprint can pass behind a residential IP and be blocked behind a datacenter one.

Can I trust a published stealth benchmark leaderboard?

Treat it as a dated, directional snapshot. Results usually come from one author, one OS, and often one IP, against targets that change anti-bot vendors and tools that change fast. Re-test for your own targets, IPs, and date rather than quoting a ranking as settled.

Last updated: 2026-06-15