How DataDome Works (2026)

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How DataDome Works (2026) — conceptual illustration

On this page

DataDome is a bot-blocking service that sits in front of roughly 1,200 enterprise sites — major e-commerce, classifieds, news, and travel sites. It has a reputation for catching automation that slips past Cloudflare without trouble, so it is worth understanding on its own. Its design is unusual in three ways: it trains a separate machine-learning (ML) model for each site, it scores requests at the application server instead of at the CDN edge (the network of servers that delivers a site close to its users), and it runs a WebAssembly (WASM — compiled code that runs at near-native speed in the browser) challenge inside the visitor's browser.

This is a reference on how DataDome is structured and what each detection layer measures.

Coverage	~1,200 enterprise sites
Model	Per-site machine-learning
Cookie	datadome
Known for	Catching bots that pass Cloudflare
Best approach	Residential IPs + real-browser fingerprint

What DataDome is

DataDome is a reverse-proxy WAF — a web application firewall that inspects traffic before it reaches the site. It runs at the application server, not at the CDN edge. Every request is forwarded to DataDome's scoring service, which decides allow-or-block and answers in roughly 2 ms. The scorer is built per customer — around 85,000 ML models, one per protected site — so the very same TLS, browser and proxy combination can pass on one DataDome customer and fail on another.

When a request looks untrustworthy, the visitor gets one of:

A silent 403 (the HTTP code for "forbidden") with the x-datadome header set.
A GeeTest-style slider captcha served inline.
A block page with a Reference #.

The four signal categories

1. IP address reputation

Where your request comes from carries the most weight: IP reputation accounts for roughly 25–30% of the score on its own — the heaviest single input.

Datacenter IPs (AWS, GCP, Azure, DigitalOcean, OVH…) — these belong to cloud and hosting providers, so they are pre-scored low. DataDome maintains one of the more accurate datacenter-range databases in the industry; many of these ranges are blanket-blocked on protected sites before any other check runs.
Residential IPs — assigned by ISPs to home connections, higher baseline trust.
Mobile IPs — cell tower and CGNAT pools (where many phones share one address), highest baseline trust.

2. The WASM `boring_challenge` and the `datadome` cookie

DataDome's signature component is the WASM boring_challenge — a small program (a state machine, written in Rust and compiled to WebAssembly) that runs in the browser. It produces a token that's POSTed to js.datadome.co, which then sets the datadome cookie — the pass that authorizes future requests.

Because the challenge is real WASM running against real browser APIs, it can't be solved without an actual browser to execute it. It also times the CPU using SIMD (instructions that crunch several numbers at once) in a way that exposes headless environments — browsers with no visible window — which no stealth-browser JavaScript patch covers. Alongside this, the sensor collects the usual fingerprint surface (canvas, WebGL, audio, fonts, screen metrics, timezone, navigator.webdriver, window.chrome) and feeds it into the WASM state.

3. HTTP and TLS fingerprinting

DataDome is one of the few WAFs that publicly markets HTTP/2 fingerprinting as a detection layer. The idea: the low-level details of how your client talks HTTP and TLS (the encryption behind https) form a fingerprint that often does not match a real browser.

Most scraping libraries still default to HTTP/1.1. Real Chrome and Firefox haven't in years.
libcurl and Go's net/http produce JA3 signatures — a hash of their TLS handshake — that don't match any real browser, even when they negotiate HTTP/2.
HTTP/2 fingerprinting tracks pseudo-header order, SETTINGS frame values, and window-update sizes — small ordering and timing choices that differ between real browsers and libraries.

4. Behavioural and pattern analysis

DataDome also runs continuous ML pattern analysis on your connection history, watching for things a normal user would not do:

The datadome cookie sent from a different IP than the one that minted it.
Reused sensor payloads across pages instead of fresh ones per navigation.
Honeypot link hits — clicks on links a human cannot see.
Bursty request timing.
Missing real-browser headers (Sec-Fetch-*, Accept-Language, sec-ch-ua).

What this means for developers

Because each site gets its own model, there is no single "DataDome solution" — a setup that works on a news customer may fail on an e-commerce one with stricter scoring. Three patterns are common in production:

Look in the initial HTML first. Many DataDome-protected Next.js sites embed the full page state in a __NEXT_DATA__ script tag. If the data is already in the first HTML response, the WASM challenge never runs — there is no follow-up request (XHR) for it to gate. For those cases, curl_cffi plus a residential proxy is enough.
Mobile or ISP residential proxies for XHR endpoints — IP weighting is so heavy that simply switching from a datacenter IP to mobile-4G frequently flips a session from blocked to 200 OK with no other change.
Real browser execution when the page actually runs the WASM challenge — for example Camoufox with its IP, timezone and locale all matching, or a managed scraping API.

DataDome is especially sensitive to IP/cookie mismatches — a datadome cookie minted on one IP looks suspicious when sent from another — so keeping one stable exit IP per session matters.

Sites commonly fronted by DataDome

E-commerce, classifieds, news and travel dominate the list of protected sites. Many of these rotate between DataDome, Cloudflare, Akamai and PerimeterX depending on conditions, so the same site may not always use DataDome.

Summary

DataDome scores each request in about 2 ms against a per-site ML model, weighing four things: IP reputation (25–30% of the score), the WASM boring_challenge and its datadome cookie, TLS and HTTP/2 fingerprints, and behavioural patterns. Because each customer gets its own model, detection behaviour varies between sites even when the underlying signals don't — which is the main reason a setup that works on one DataDome target may not carry over to another.

Related terms

What Is a Residential Proxy?

A residential proxy sends your web traffic through a real home internet connection — a regular broadband or fiber line — instead of through …

What Is Camoufox?

Camoufox is a fork of Firefox with anti-fingerprinting patches applied at the C++ build level. That phrase matters: most anti-fingerprinting…

What is Puppeteer? (Complete Guide 2026)

Puppeteer is a Node.js tool that lets your code drive a real Chrome browser automatically — clicking, typing, and reading pages just like a …

How to handle CAPTCHA in web scraping? (2026 Solutions)

A CAPTCHA is a test a website shows to tell humans apart from bots (the name stands for a "completely automated test to tell computers and h…

How to scrape dynamic JavaScript content? (2026 Guide)

Dynamic content is anything a page loads after the initial HTML arrives — usually pulled in by JavaScript running in your browser. Because t…

How Kasada Works (2026)

Kasada is an anti-bot WAF — a security layer that sits in front of a website and decides which visitors to let through. What makes it stand …

What's the Difference Between Web Crawling and Scraping? (2026 Guide)

Crawling and scraping are two different jobs that often work together. Crawling is how you find pages: a program follows links from page to …

Concept map

How How DataDome Works (2026) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Automation

Frequently asked questions

Why does DataDome catch bots that pass Cloudflare?

It trains a separate ML model for each site, using device, network, and behavioural signals, so it adapts to each target instead of applying one generic ruleset. A generic anti-fingerprinting setup that passes Cloudflare can still look anomalous to it.

What triggers a DataDome block?

Datacenter IPs, fingerprint inconsistencies, and behavioural anomalies each push the score higher; once it crosses the threshold, DataDome returns a 403 with a challenge page.

Which sites use DataDome?

Major e-commerce, classifieds, news, and travel platforms, among roughly 1,200 enterprise sites.

Last updated: 2026-05-31