Web Automation

How DataDome Works (2026)

How DataDome Works (2026) — conceptual illustration
On this page

DataDome is a bot-blocking service that sits in front of roughly 1,200 enterprise sites — major e-commerce, classifieds, news, and travel sites. It has a reputation for catching automation that slips past Cloudflare without trouble, so it is worth understanding on its own. Its design is unusual in three ways: it trains a separate machine-learning (ML) model for each site, it scores requests at the application server instead of at the CDN edge (the network of servers that delivers a site close to its users), and it runs a WebAssembly (WASM — compiled code that runs at near-native speed in the browser) challenge inside the visitor's browser.

This is a reference on how DataDome is structured and what each detection layer measures.

Quick facts

Coverage~1,200 enterprise sites
ModelPer-site machine-learning
Cookiedatadome
Known forCatching bots that pass Cloudflare
Best approachResidential IPs + real-browser fingerprint

What DataDome is

DataDome is a reverse-proxy WAF — a web application firewall that inspects traffic before it reaches the site. It runs at the application server, not at the CDN edge. Every request is forwarded to DataDome's scoring service, which decides allow-or-block and answers in roughly 2 ms. The scorer is built per customer — around 85,000 ML models, one per protected site — so the very same TLS, browser and proxy combination can pass on one DataDome customer and fail on another.

When a request looks untrustworthy, the visitor gets one of:

  • A silent 403 (the HTTP code for "forbidden") with the x-datadome header set.
  • A GeeTest-style slider captcha served inline.
  • A block page with a Reference #.

The four signal categories

1. IP address reputation

Where your request comes from carries the most weight: IP reputation accounts for roughly 25–30% of the score on its own — the heaviest single input.

  • Datacenter IPs (AWS, GCP, Azure, DigitalOcean, OVH…) — these belong to cloud and hosting providers, so they are pre-scored low. DataDome maintains one of the more accurate datacenter-range databases in the industry; many of these ranges are blanket-blocked on protected sites before any other check runs.
  • Residential IPs — assigned by ISPs to home connections, higher baseline trust.
  • Mobile IPs — cell tower and CGNAT pools (where many phones share one address), highest baseline trust.

2. The WASM boring_challenge and the datadome cookie

DataDome's signature component is the WASM boring_challenge — a small program (a state machine, written in Rust and compiled to WebAssembly) that runs in the browser. It produces a token that's POSTed to js.datadome.co, which then sets the datadome cookie — the pass that authorizes future requests.

Because the challenge is real WASM running against real browser APIs, it can't be solved without an actual browser to execute it. It also times the CPU using SIMD (instructions that crunch several numbers at once) in a way that exposes headless environments — browsers with no visible window — which no stealth-browser JavaScript patch covers. Alongside this, the sensor collects the usual fingerprint surface (canvas, WebGL, audio, fonts, screen metrics, timezone, navigator.webdriver, window.chrome) and feeds it into the WASM state.

3. HTTP and TLS fingerprinting

DataDome is one of the few WAFs that publicly markets HTTP/2 fingerprinting as a detection layer. The idea: the low-level details of how your client talks HTTP and TLS (the encryption behind https) form a fingerprint that often does not match a real browser.

  • Most scraping libraries still default to HTTP/1.1. Real Chrome and Firefox haven't in years.
  • libcurl and Go's net/http produce JA3 signatures — a hash of their TLS handshake — that don't match any real browser, even when they negotiate HTTP/2.
  • HTTP/2 fingerprinting tracks pseudo-header order, SETTINGS frame values, and window-update sizes — small ordering and timing choices that differ between real browsers and libraries.

4. Behavioural and pattern analysis

DataDome also runs continuous ML pattern analysis on your connection history, watching for things a normal user would not do:

  • The datadome cookie sent from a different IP than the one that minted it.
  • Reused sensor payloads across pages instead of fresh ones per navigation.
  • Honeypot link hits — clicks on links a human cannot see.
  • Bursty request timing.
  • Missing real-browser headers (Sec-Fetch-*, Accept-Language, sec-ch-ua).

What this means for developers

Because each site gets its own model, there is no single "DataDome solution" — a setup that works on a news customer may fail on an e-commerce one with stricter scoring. Three patterns are common in production:

  1. Look in the initial HTML first. Many DataDome-protected Next.js sites embed the full page state in a __NEXT_DATA__ script tag. If the data is already in the first HTML response, the WASM challenge never runs — there is no follow-up request (XHR) for it to gate. For those cases, curl_cffi plus a residential proxy is enough.
  2. Mobile or ISP residential proxies for XHR endpoints — IP weighting is so heavy that simply switching from a datacenter IP to mobile-4G frequently flips a session from blocked to 200 OK with no other change.
  3. Real browser execution when the page actually runs the WASM challenge — for example Camoufox with its IP, timezone and locale all matching, or a managed scraping API.

DataDome is especially sensitive to IP/cookie mismatches — a datadome cookie minted on one IP looks suspicious when sent from another — so keeping one stable exit IP per session matters.

Sites commonly fronted by DataDome

E-commerce, classifieds, news and travel dominate the list of protected sites. Many of these rotate between DataDome, Cloudflare, Akamai and PerimeterX depending on conditions, so the same site may not always use DataDome.

Summary

DataDome scores each request in about 2 ms against a per-site ML model, weighing four things: IP reputation (25–30% of the score), the WASM boring_challenge and its datadome cookie, TLS and HTTP/2 fingerprints, and behavioural patterns. Because each customer gets its own model, detection behaviour varies between sites even when the underlying signals don't — which is the main reason a setup that works on one DataDome target may not carry over to another.

Related terms

Concept map

How How DataDome Works (2026) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Automation
Building map…

Frequently asked questions

Why does DataDome catch bots that pass Cloudflare?

It trains a separate ML model for each site, using device, network, and behavioural signals, so it adapts to each target instead of applying one generic ruleset. A generic anti-fingerprinting setup that passes Cloudflare can still look anomalous to it.

What triggers a DataDome block?

Datacenter IPs, fingerprint inconsistencies, and behavioural anomalies each push the score higher; once it crosses the threshold, DataDome returns a 403 with a challenge page.

Which sites use DataDome?

Major e-commerce, classifieds, news, and travel platforms, among roughly 1,200 enterprise sites.

Last updated: 2026-05-31