How to Build an Anti-Bot Challenge the Right Way

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How to Build an Anti-Bot Challenge the Right Way — conceptual illustration

On this page

An anti-bot challenge is a small test a server makes your browser run — like proof-of-work (forcing the browser to burn some CPU on a puzzle), collecting a fingerprint (a profile of your browser and device), or watching how you behave — to tell real browsers apart from automated scripts before letting them in. Building a good one is less about clever cryptography and more about four design rules: check every signal on the server (never trust the client's word for it), tie each proof to one session, IP, and time window, assume the challenge runs on a machine the attacker controls, and base the work on something only a genuine browser can produce. Most home-built challenges fail on day one because they trust the client and verify almost nothing.

Challenge types	Proof-of-work, fingerprint hash, behavioural timing
Core principle	Every signal must reach and influence a server-side decision
Most common failure	Client-collected fingerprint data the server never checks
Realistic goal	Raise automation cost 10–100×, not absolute prevention

Why most homegrown challenges fail

Here is a real, working example: a custom Proof-of-Work (PoW) system — a puzzle that costs the browser CPU time to solve — guarding a sign-up flow. On paper it looks solid: three endpoints and a heavily obfuscated Web Worker (a background script the page runs off the main thread):

GET /api/pow/worker — returns a 178 KB obfuscated JavaScript Web Worker.
GET /api/pow/challenge — returns a base64 challenge blob, fetched by the worker internally.
POST /api/pow/verify — accepts {challengeId, solution} and returns {"success":true}.

The worker talks to the page using typed postMessage messages (the browser's way to pass data between a page and its worker): the page sends {t:"start", o: origin}, the worker asks for browser fingerprints (canvas frames, DOM text-measurement rects, WebGL info, performance values, a navigator string), and returns a 12-byte solution. The obfuscation is real — a Function() wrapper, randomised identifiers, a scrambled lookup table, and control-flow flattening that hides the order code runs in.

It still fell to a script that used no browser at all. The attacker downloaded the worker, ran it under Node.js with eval(), faked the self, postMessage, onmessage and fetch objects the worker expected, and fed it made-up data shaped to look like real fingerprints. The worker fetched the challenge through the faked fetch, computed the solution, and the script POSTed it back using an HTTP client that copied a Chrome TLS fingerprint (TLS is the encryption layer behind https, and its handshake leaves a recognisable signature) — start to finish in under a second. The obfuscation was irrelevant; the design had nothing underneath it. Every principle below comes from why that worked.

Principle 1: never trust data the server never validates

Here is the fatal flaw: the verify endpoint only checked {challengeId, solution}. All that carefully collected fingerprint data was never sent to the server, so it was never checked. That meant the solver could send solid-colour rectangles instead of real canvas renders, a hardcoded GPU string, and fixed screen dimensions. The server can't object to data it never sees.

If a signal does not reach the server and change the decision, it does not exist. So make the fingerprint part of the proof itself:

solution = solve(challenge, sha256(canvasFrames + domRects + webglInfo + perfValues + browserMeta))

The server records the expected fingerprint hash when it hands out the challenge, then recomputes it at verify time, so fake inputs simply produce a wrong solution. Then cross-check the pieces against each other: if the metadata claims an NVIDIA GPU, the WebGL renderer must also say NVIDIA; if it reports 4 cores, the solve time should match what a 4-core machine would do. This ties directly into fingerprint clustering — fields that contradict each other are free bot signals.

Principle 2: bind the challenge to session, IP, and time

In the example, the solution was just a standalone {challengeId, solution} pair, tied to nothing — no session, no IP, no TLS fingerprint, no expiry. The script even fetched the challenge and submitted the verify from totally different places. That opens three easy attacks: solve once and replay it, run a solve farm that hands answers to many clients, and submit today's solution tomorrow.

Issue the challenge against a session cookie set when the page loads, and require that same session for /challenge, /verify and the protected action.
Sign an HMAC (a tamper-proof checksum keyed with a server secret) over (challengeId, sessionId, IP, timestamp) and check all four on verify.
Expire challenges in 30–60 seconds and rate-limit how many each IP/session can request.
Require the JA3/JA4 TLS fingerprint of the challenge request to match the verify request.

A solution should prove that this client, on this connection, right now did the work — not be a portable token anyone can pick up and reuse.

Principle 3: assume the challenge runs in a hostile environment

The worker had no idea whether it was actually inside a real browser. Run under Node.js eval(), it had full access to global, require, process and module, and the attacker just swapped in a fake fetch. Obfuscation slowed down reading the code; it did nothing to stop running it. So probe the environment from inside the obfuscated worker and fold the result into the math (don't simply throw an error — a thrown error is trivial to patch out):

// Present in real Workers, absent or different under Node.js eval
if (typeof WorkerGlobalScope === 'undefined') corrupt();
if (typeof importScripts !== 'function') corrupt();
if (typeof process !== 'undefined') corrupt();
if (typeof require !== 'undefined') corrupt();

Then make browser-only APIs essential to the calculation (crypto.subtle.digest(), high-resolution performance.now(), SharedArrayBuffer/Atomics) so an attacker can't skip emulating them. Finally, keep the code moving: embed a per-session nonce (a one-time random value) when you serve it, and regenerate the variable names and control flow on every request, so a cached eval()-based solver breaks the instant the script changes shape.

Principle 4: root the proof in real-browser work

Every fingerprint input in the example could be forged with plain arithmetic. The canvas drawing was a pure formula, so the solver produced identical bytes without ever touching the Canvas API. The DOM rects measured a fixed string in a fixed font — always the same numbers. WebGL info was just a string. To force real hardware into the loop, lean on things that genuinely vary by physical device, and always send the server a hash, never the raw values:

Seed canvas rendering with a per-challenge random value the server knows; use globalCompositeOperation, shadowBlur and system fonts to amplify the tiny GPU anti-aliasing differences between devices, then check the output hash against known-good GPU families.
Prefer WebGL shader output — its hardware floating-point precision is hard to fake without the real GPU.
Randomise font family and size per challenge, and use proportional fonts so the text widths can't be precomputed.
Chain the frames: frame N's input depends on the hash of frame N-1's output, so the work can't be split across CPUs or shortcut.

A good challenge doesn't just collect a fingerprint — it forces one that has to land inside a real device cluster.

Principle 5: detect automation where it can’t be read, and use timing

The example's automation checks (navigator.webdriver, plus a User-Agent blocklist for selenium/puppeteer/playwright) lived in the readable page code, not the worker, so the solver never ran them. Two fixes matter:

Move detection into the obfuscated worker and make its result feed the calculation, so it can't be skipped by running the worker on its own.
Use behavioural timing: a real browser takes 5–50 ms to render canvas, read DOM rects and query WebGL; the solver answered in under 1 ms. Reject impossibly fast responses, capture performance.now() before and after the PoW, and add a server-side fence — too fast (< 200 ms after issuance) is a bot, too slow (> 60 s) is a replay.

Principle 6: cross-validate every signal

On its own each signal can be spoofed; combined with cross-checks, faking them all consistently gets expensive:

The User-Agent inside the fingerprint must match the User-Agent HTTP header on verify, and sec-ch-ua-platform must match the reported platform.
The timezone must be plausible for the location of the client's IP.
If the same session reports different screen sizes or languages from one request to the next, flag it.
Mix in genuine per-request randomness (crypto.getRandomValues(), the collection-time Date.now()) so no two submissions are byte-for-byte identical.

The architecture worth building

The robust version pulls all the principles into a single flow:

Issue returns a signed (challengeId, seed, timestamp, sessionId) bundle, tied to a session cookie.
The worker uses the seed to drive non-deterministic canvas/WebGL/font operations and probes its environment.
The worker hashes all the fingerprint data and folds that hash into the PoW calculation.
The solution is (challengeId, solution, fingerprintHash).
Verify checks that the challenge is valid and unexpired, the session matches, the fingerprint hash fits known-good hardware, the solve time matches the reported core count, the IP and TLS fingerprint match the ones at issuance, and the solution is correct for challenge + fingerprintHash.

If you're building one today, this is the order that buys the most security per unit of effort:

Priority	Move	Why it matters
P0	Embed the fingerprint hash in the solution	Breaks every fake-input solver instantly
P0	Bind to session + IP + time	Kills replay, farming and cross-context solving
P1	Environment probes inside the worker	Detects `eval()` outside a browser
P1	Per-challenge canvas/font seed	Ends deterministic, precomputed fingerprints
P2	Move detection into the worker	Can’t be skipped by running the worker directly
P2	Server-side timing fence	Catches sub-millisecond “instant” solves
P3	WebGL shader-output verification	Forces a real GPU into the loop

What good looks like

A well-built anti-bot challenge has a few non-negotiable properties: every signal it collects feeds a server-side decision; the proof is locked to one client, one connection and one short time window; the challenge code assumes a hostile runtime and makes running it outside a browser expensive; and the work depends on physical-device behaviour rather than arithmetic anyone can reproduce. Obfuscation is the last and least important layer — it buys time, not security.

The custom PoW in the example wasn't beaten by clever cryptanalysis. It was beaten because it trusted the client, validated almost nothing, ran in an environment the attacker controlled, and built its proof out of forgeable arithmetic. Fix those four things and you have a challenge worth the bytes it ships. To see how production vendors put these ideas to work at scale, see Cloudflare Bot Management and anti-bot detection.

Related terms

What Is Anti-Bot Detection?

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…

What Is Fingerprint Clustering?

Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

What Is Canvas Fingerprinting?

Canvas fingerprinting is a way for a website to identify your device by asking the browser to draw a tiny invisible image, then turning the …

What Is WebGL Fingerprinting?

WebGL fingerprinting reads identifying information directly from the GPU. WebGL is the browser feature that lets web pages draw 3D graphics …

What Is TLS Fingerprinting (JA3/JA4)?

TLS fingerprinting is a way to recognize what software made a connection just by looking at how it sets up encryption — before the server re…

What Is Behavioural Bot Detection?

Behavioural bot detection is the part of anti-bot scoring that asks "how does this client act?" instead of "what is this client?". Instead o…

What Is Headless Browser Detection?

Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…

What Is a DOM Honeypot?

A DOM honeypot is an invisible form field or link that humans never see but bots fill in or click. The DOM (Document Object Model) is the li…

What Is Fingerprint Lie Detection?

Fingerprint lie detection is the practice of verifying that the signals a browser reports are internally consistent and untampered, rather t…

What Is Cloudflare Turnstile?

Cloudflare Turnstile is a service that checks whether a visitor is a real human, but without showing the kind of puzzle a normal CAPTCHA doe…

What Is curl_cffi?

curl_cffi is a Python HTTP client whose TLS fingerprint looks exactly like real Chrome, Firefox, or Safari. TLS is the encryption layer behi…

Concept map

How How to Build an Anti-Bot Challenge connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Anti-Bot

Tools & solutions for this topic

Frequently asked questions

Is proof-of-work enough to stop bots?

No. Proof-of-work only shows that a client spent some CPU time, and a server or solve farm can do that cheaply. It's useful as one layer, but on its own it proves neither a real browser nor a real user. Tie it to a session, validate fingerprints on the server, and add timing checks.

Should fingerprint data be validated on the client or the server?

Always the server. Any check that runs only on the client can be skipped by running the challenge code outside a browser. The classic mistake is collecting rich fingerprint data in the browser and then never sending it to the server to validate.

Why did obfuscating the worker not protect it?

Obfuscation only slows down reading the code. An attacker doesn't need to read it — they run it with eval() and fake browser globals, watching the messages to learn the protocol. Security has to come from the design (server-side validation, session binding, real-hardware work), not from making the code hard to read.

Can a custom challenge ever fully stop automation?

No design is unbeatable — a real browser farm can solve almost anything. The realistic goal is to raise the cost 10–100×, pushing attackers from a cheap script into running full browsers with real GPUs on rotating residential IPs, which is slow, expensive, and itself easier to detect.

Last updated: 2026-05-31