What Cloudflare Bot Management is
Cloudflare works as a reverse proxy at the CDN edge — meaning it sits between the visitor and the real server, so every request passes through Cloudflare first. Each request to a protected site is scored by a single global machine-learning model trained on roughly 20% of all internet traffic. In a few milliseconds the model returns a bot score from 1–99 (1 = almost certainly a bot, 99 = almost certainly a human), and the site's WAF rules decide what to do with it — let it through, show a JavaScript challenge, show a managed challenge, or block it outright.
When a request fails, you typically see one of these:
error 1020— you tripped an access rule.error 1015— you're being rate limited (too many requests too fast).- A managed challenge page (Turnstile).
- A silent 403 carrying a
cf-rayheader (Cloudflare's request ID).
The scorer doesn't care why you're automating. A price-comparison crawler and a credential-stuffing bot look identical to it; it only sees signals, not intentions.
The four signal categories
1. IP address reputation
Cloudflare keeps a reputation database keyed by ASN (the network block an IP belongs to), built from traffic it has already seen across its whole network. Where your IP comes from sets your starting score:
- Datacenter IPs (AWS, GCP, Azure, DigitalOcean, OVH…) — pre-scored low. A request from a known cloud range starts with a poor score before any other check even runs.
- Residential IPs — the kind ISPs hand out to home internet connections, treated as much more trustworthy.
- Mobile IPs — assigned to cell towers and carrier CGNAT pools (shared mobile-network addresses). These get the highest baseline trust, because the pools are small and rotate naturally as phones move around.
On the very first request of a session — before any JavaScript or fingerprint data exists — IP reputation is the single biggest input to the score.
2. JavaScript fingerprinting and challenges
Plain HTTP clients (requests, axios, curl) just fetch pages; they don't run JavaScript. Cloudflare exploits this with a JS challenge — a script that must compute a token from values scattered around the page. No JavaScript engine, no token, no entry.
Headless browsers (real browsers driven by code, with no visible window) do run JavaScript, but their environment differs from a normal Chrome in dozens of small ways: the navigator.webdriver flag, missing plugins, the shape of window.chrome, canvas and WebGL outputs (how the browser draws graphics), font enumeration, timezone and locale mismatches, even the order in which the permissions APIs respond. Cloudflare hashes all of that into a fingerprint and compares it against known-automation patterns. Cloudflare Turnstile is the part of this pipeline the user actually sees.
3. HTTP and TLS fingerprinting
Before a single line of HTML is exchanged, Cloudflare can already fingerprint you from the TLS handshake (TLS is the encryption layer behind https; the handshake is the setup conversation that starts every connection, identified by JA3/JA4) and from how your client speaks HTTP/2.
- Most scraping libraries still default to HTTP/1.1. Real Chrome and Firefox stopped doing that years ago.
libcurland Go'snet/httpproduce JA3 signatures that don't match any real browser, even when they do negotiate HTTP/2.- HTTP/2 fingerprinting digs deeper still: the order of pseudo-headers, the SETTINGS frame values, and window-update sizes all leak which client you really are.
So a User-Agent: Chrome header on a Python requests call is contradicted by the TLS handshake long before anyone reads the headers — the disguise is blown at the door.
4. Behavioural and pattern analysis
Cloudflare logs every connection, so your behaviour over time is just as visible as any single request:
- Missing headers a real browser always sends (
Sec-Fetch-*,Accept-Language,sec-ch-ua). - Payloads sent in the wrong order or encoding.
- Cookies from the previous response that the next request fails to echo back.
- Hits on URLs no human ever visits — honeypot links hidden in the page's DOM specifically to catch crawlers that blindly follow every link.
- Bursty timing: 200 requests in 5 seconds, then silence.
All of this feeds Cloudflare's ML pattern analysis, which can flag a whole session even when each individual request looks fine on its own.
What this means for developers
The key takeaway from the four-signal model is that fixing one layer rarely moves the score. A residential proxy sitting on top of a fingerprint that screams HeadlessChrome will still fail; so will a fully patched browser running on a flagged AWS IP. The tooling generally falls into three buckets:
- HTTP clients with browser-impersonating TLS —
curl_cffi,curl-impersonate,tls-client. These match the TLS/HTTP/2 layer but can't run JS challenges. - Patched browsers — Playwright with fingerprint-consistency plugins,
patchright, Camoufox. These cover JS execution and the fingerprint surface but cost a lot per request. - Managed scraping APIs — services that combine the two and handle proxy rotation and session continuity behind a single endpoint.
Reusing the same session value across requests keeps your cookies and trust score warm. Spinning up a fresh session for every request looks far more scripted than one that browses steadily for a few minutes.
Sites commonly fronted by Cloudflare
Cloudflare is the most widely deployed WAF on the web. Frequently studied targets span major retail, jobs, review, listings, and logistics sites. Many large sites rotate between Cloudflare, Akamai, DataDome and PerimeterX depending on traffic, so the detection logic you hit is rarely the same from day to day.
Summary
Cloudflare never makes a flat yes/no call. It blends four things into one continuous bot score: IP reputation, JavaScript execution and fingerprint, TLS/HTTP/2 handshake characteristics, and behavioural patterns over time. Any one of the four can pull the score below the WAF's threshold and get you blocked. And because the detection model keeps evolving, anything that relies on beating a single signal will eventually break.
