How DataDome works
When a request hits a DataDome-protected site, that request is forwarded in real time to DataDome's scoring service while the page is being served. The model looks at several signals at once: IP reputation (the history and trustworthiness of your IP address, which by itself accounts for 25–30% of the score), the TLS fingerprint (the unique signature your client leaves when it sets up the https connection), HTTP/2 frame characteristics (low-level details of how your client packages its requests), the datadome cookie if one is present, and any behavioural data the site has collected from you before. A score comes back in roughly 2 ms — fast enough to block a bot inline without slowing down the page for real users.
The WASM boring_challenge is DataDome's most distinctive piece. WASM, short for WebAssembly, is compiled code that runs inside the browser at near-native speed. This challenge is a small program written in Rust and compiled to WASM; it runs in the browser as a state machine (a step-by-step process that moves through defined states) and spits out a token proving the work was done. Because it is real code executing against real browser APIs, you cannot solve it without an actual browser environment. Headless detection (spotting browsers with no visible window, the usual sign of automation) happens here too: the WASM probes the CPU using SIMD timing — measuring how fast certain parallel math instructions run — in a way that no stealth-browser JavaScript patch can fake.
Why its behaviour varies per site
With 85,000 per-site models, DataDome tunes how strict it is for each customer. Le Monde (a news site, light scoring) blocks far less aggressively than Grainger (e-commerce, hard scoring). So the same client configuration can be scored very differently from one customer's site to another. There is no single, universal way it behaves — the model is per-site and can be retrained at any time.
What scrapers actually do
Three strategies, in priority order:
- Look for the data in the initial HTML first. Many DataDome-protected Next.js sites embed the full page state in a
__NEXT_DATA__script tag — confirmed on Grainger.com, where a 110KB JSON blob holds all the product data right in the first HTML response. A tool likecurl_cffiplus a residential proxy fetches that HTML directly. DataDome never even runs its WASM check, because no follow-up XHR (background JavaScript request for more data) ever fires. - Use mobile or ISP residential proxies for XHR endpoints. IP reputation carries so much weight that simply switching from a datacenter IP to a mobile-4G one often flips a session from blocked to a 200 OK response with nothing else changed. Rotating residential IPs is risky; static ISP or mobile IPs are the safest choice.
- Use Camoufox with
geoip=Truewhen the page genuinely runs the WASM challenge. The five identity signals — IP, WebRTC (a browser feature that can leak your real IP), DNS, timezone, and Accept-Language — all have to point to the same location.
Datacenter IPs are not a viable starting point: their poor IP reputation gets them rejected before any fingerprint detail even comes into play.
