Anti-Bot

How Is Browser Stealth Benchmarked?

How Is Browser Stealth Benchmarked? - conceptual illustration
On this page

Benchmarking browser stealth means measuring how detectable an automated browser session is, rather than how fast or efficient the automation engine is. Two kinds of verdict are involved. A live or real-world verdict is an actual decision by a deployed detector or human - allowed, challenged, or blocked - measured against genuine anti-bot systems. An inferred verdict is an estimate of detectability from a fingerprint-only model, with no real gatekeeper in the loop. The two can disagree, which is the whole point of distinguishing them: passing a fingerprint test page is not the same as passing a production detector.

Quick facts

Live verdictA real detector or human decides allow / challenge / block
Inferred verdictA fingerprint-only model predicts detectability
Why they divergeLive detectors also weigh IP, TLS, HTTP/2, session history
Not the same asEngine benchmarks (memory / CPU / speed / pass-rate)
ValiditySnapshots - single author, dated, version-volatile tools

Live verdict vs. inferred verdict

A fingerprint test page - the CreepJS / BrowserScan family - produces an inferred verdict: it inspects what the browser reports and predicts how identifiable it is, but nothing is actually deciding to let you through. A live benchmark instead drives a tool against real targets behind real anti-bot gates and records what each gate does - allowed, served a challenge, or blocked. Because the live test measures the decision a production system makes, it captures factors a client-side page never sees, while the inferred test is repeatable and cheap but blind to the network and server side. Good stealth benchmarking is explicit about which of the two it is reporting.

Why identical fingerprints get opposite verdicts

The same client fingerprint can pass on one run and be blocked on another because production detectors weigh signals the fingerprint does not contain: IP reputation (a residential address versus a datacenter ASN), the TLS handshake fingerprint, HTTP/2 frame ordering, the shape of the automation protocol driving the browser, and session or behavioural history. Gates cross-check these layers for agreement - a Linux server advertising a desktop browser from a residential proxy is a contradiction a gate can flag - so an identical browser fingerprint passes behind one network context and fails behind another. That is why an inferred score and a live verdict are not interchangeable, and why calibration between them is an open question rather than a fixed number.

Why benchmarks are snapshots

Public stealth benchmarks are best read as dated snapshots, not standings. They are typically one author's run, on one operating system and often a single IP, against targets that swap anti-bot vendors without notice and tools that change fast - many are pre-alpha and shift behaviour between versions. Rotating the proxy can reorder the results; a browser-version difference can be mistaken for a tool difference. This is distinct from a browser-automation engine benchmark, which compares tools on memory, CPU, and speed; stealth benchmarking measures detectability itself. Use either as a directional reference and re-test for your own targets, IPs, and dates rather than quoting a leaderboard as settled fact.

Code example

text
Two questions a stealth benchmark can answer - keep them separate:

  INFERRED verdict (fingerprint-only)
    - load a test page (CreepJS / BrowserScan)
    - read its predicted detectability score
    - cheap, repeatable, but blind to IP / TLS / session

  LIVE verdict (real-world)
    - drive the tool against real targets behind real gates
    - record per target: allowed / challenged / blocked
    - captures server-side signals, but is a dated snapshot

The two can disagree: an identical fingerprint may pass behind a residential
IP and be blocked behind a datacenter one. Re-test for your own targets,
IPs, and date - do not treat any single run as a fixed ranking.

Related terms

Concept map

How How Is Browser Stealth Benchmarked connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

What does a browser stealth benchmark measure?

It measures how detectable an automated browser session is to anti-bot systems, rather than how fast or resource-efficient the automation engine is. It is about the detectability signal itself.

What is the difference between a live and an inferred verdict?

A live verdict is an actual decision by a real detector or human - allowed, challenged, or blocked - against genuine anti-bot systems. An inferred verdict is a prediction from a fingerprint-only test page, with no real gatekeeper. The two can disagree.

Why can the same fingerprint pass once and fail once?

Because production detectors also weigh server-side signals the fingerprint does not include: IP reputation, TLS and HTTP/2 fingerprints, the automation protocol shape, and session history. The same browser fingerprint can pass behind a residential IP and be blocked behind a datacenter one.

Can I trust a published stealth benchmark leaderboard?

Treat it as a dated, directional snapshot. Results usually come from one author, one OS, and often one IP, against targets that change anti-bot vendors and tools that change fast. Re-test for your own targets, IPs, and date rather than quoting a ranking as settled.

Last updated: 2026-06-15