Anti-Bot

What Is DataDome?

What Is DataDome? — conceptual illustration
On this page

DataDome is a bot-protection vendor used on roughly 1,200 enterprise sites, scoring more than 5 trillion signals per day. Its job is to tell real visitors apart from automated scrapers. Unlike Cloudflare and Akamai, it trains a separate machine-learning model for each protected site (roughly 85,000 in total), runs at the website's own application server rather than at the CDN edge (the network of relay servers that sit in front of a site), and returns a verdict in about 2 ms. That design means its behaviour varies from site to site. What is observed on Grainger.com may differ on Le Monde even with the exact same TLS (the encryption layer behind https), same browser, and same proxy.

Quick facts

Detection cookiedatadome (also dd_cookie_test)
Models~85,000 — one per protected site
Decision latency~2 ms, real-time, per request
Key signalsIP reputation (25–30%), TLS, WASM boring_challenge, Picasso device FP, 35+ behavioural signals
How it is studiedcurl_cffi + mobile/residential proxy, __NEXT_DATA__ extraction

How DataDome works

When a request hits a DataDome-protected site, that request is forwarded in real time to DataDome's scoring service while the page is being served. The model looks at several signals at once: IP reputation (the history and trustworthiness of your IP address, which by itself accounts for 25–30% of the score), the TLS fingerprint (the unique signature your client leaves when it sets up the https connection), HTTP/2 frame characteristics (low-level details of how your client packages its requests), the datadome cookie if one is present, and any behavioural data the site has collected from you before. A score comes back in roughly 2 ms — fast enough to block a bot inline without slowing down the page for real users.

The WASM boring_challenge is DataDome's most distinctive piece. WASM, short for WebAssembly, is compiled code that runs inside the browser at near-native speed. This challenge is a small program written in Rust and compiled to WASM; it runs in the browser as a state machine (a step-by-step process that moves through defined states) and spits out a token proving the work was done. Because it is real code executing against real browser APIs, you cannot solve it without an actual browser environment. Headless detection (spotting browsers with no visible window, the usual sign of automation) happens here too: the WASM probes the CPU using SIMD timing — measuring how fast certain parallel math instructions run — in a way that no stealth-browser JavaScript patch can fake.

Why its behaviour varies per site

With 85,000 per-site models, DataDome tunes how strict it is for each customer. Le Monde (a news site, light scoring) blocks far less aggressively than Grainger (e-commerce, hard scoring). So the same client configuration can be scored very differently from one customer's site to another. There is no single, universal way it behaves — the model is per-site and can be retrained at any time.

What scrapers actually do

Three strategies, in priority order:

  1. Look for the data in the initial HTML first. Many DataDome-protected Next.js sites embed the full page state in a __NEXT_DATA__ script tag — confirmed on Grainger.com, where a 110KB JSON blob holds all the product data right in the first HTML response. A tool like curl_cffi plus a residential proxy fetches that HTML directly. DataDome never even runs its WASM check, because no follow-up XHR (background JavaScript request for more data) ever fires.
  2. Use mobile or ISP residential proxies for XHR endpoints. IP reputation carries so much weight that simply switching from a datacenter IP to a mobile-4G one often flips a session from blocked to a 200 OK response with nothing else changed. Rotating residential IPs is risky; static ISP or mobile IPs are the safest choice.
  3. Use Camoufox with geoip=True when the page genuinely runs the WASM challenge. The five identity signals — IP, WebRTC (a browser feature that can leak your real IP), DNS, timezone, and Accept-Language — all have to point to the same location.

Datacenter IPs are not a viable starting point: their poor IP reputation gets them rejected before any fingerprint detail even comes into play.

Code example

python
from curl_cffi import requests
import chompjs, re

# Many DataDome-protected sites embed all data in __NEXT_DATA__
r = requests.get(
    "https://target.com/product/123",
    impersonate="chrome131",
    proxies={"https": "http://user:pass@mobile-4g-proxy:port"},
)
m = re.search(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', r.text, re.S)
data = chompjs.parse_js_object(m.group(1))
print(data["props"]["pageProps"]["product"])

Related terms

Concept map

How DataDome connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

Is DataDome the same as Cloudflare?

No. Cloudflare runs at the CDN edge (the relay network in front of a site) and uses one global ML model trained on roughly 20% of all internet traffic. DataDome runs per-site at the application layer, with 85,000 separate models. They look for different things and behave very differently.

Does a residential proxy alone change how DataDome scores a session?

On the lightest deployments, IP reputation carries enough weight that it can. On most production e-commerce or ticketing sites, more signals are evaluated: the TLS fingerprint (the https handshake signature), and — for XHR endpoints (background data requests) — whether a real browser context ran the WASM challenge. The point is that DataDome scores many signals together, not the IP alone.

Why does DataDome respond in 2 ms?

Because every request is scored on its own, inline, with no warm-up period where trust slowly builds up. The speed matters because the site can't afford to make real users wait while a model thinks. The catch for scrapers: every single request gets scored, not just the first one.

Does the datadome cookie mean I am whitelisted?

No. The cookie just marks a session DataDome has seen before; the score is recalculated on every request. A valid cookie that passed on request 1 can still fail on request 50 if your behavioural fingerprint starts to look off. The cookie is a hint, not a free pass.

Last updated: 2026-05-31