How Cloudflare Works (2026)

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How Cloudflare Works (2026) — conceptual illustration

On this page

Cloudflare's Bot Management is a security layer that decides whether each visitor to a website is a human or an automated script. It sits in front of roughly 20% of the public web — including major retail, jobs, review, and listings sites — so it's the WAF (Web Application Firewall — a filter that screens incoming traffic) that most developers run into first when they start working with HTTP clients and headless browsers.

This is a reference on what Cloudflare actually measures, how its scoring pipeline is structured, and what each detection layer means for someone building automation. It is not a how-to.

Coverage	~20% of the public web
Key signals	TLS/JA3, HTTP/2 fingerprint, ML bot score
Clearance	cf_clearance cookie
Challenge	Turnstile / managed challenge
Best approach	Real-browser execution + residential IPs

What Cloudflare Bot Management is

Cloudflare works as a reverse proxy at the CDN edge — meaning it sits between the visitor and the real server, so every request passes through Cloudflare first. Each request to a protected site is scored by a single global machine-learning model trained on roughly 20% of all internet traffic. In a few milliseconds the model returns a bot score from 1–99 (1 = almost certainly a bot, 99 = almost certainly a human), and the site's WAF rules decide what to do with it — let it through, show a JavaScript challenge, show a managed challenge, or block it outright.

When a request fails, you typically see one of these:

error 1020 — you tripped an access rule.
error 1015 — you're being rate limited (too many requests too fast).
A managed challenge page (Turnstile).
A silent 403 carrying a cf-ray header (Cloudflare's request ID).

The scorer doesn't care why you're automating. A price-comparison crawler and a credential-stuffing bot look identical to it; it only sees signals, not intentions.

The four signal categories

1. IP address reputation

Cloudflare keeps a reputation database keyed by ASN (the network block an IP belongs to), built from traffic it has already seen across its whole network. Where your IP comes from sets your starting score:

Datacenter IPs (AWS, GCP, Azure, DigitalOcean, OVH…) — pre-scored low. A request from a known cloud range starts with a poor score before any other check even runs.
Residential IPs — the kind ISPs hand out to home internet connections, treated as much more trustworthy.
Mobile IPs — assigned to cell towers and carrier CGNAT pools (shared mobile-network addresses). These get the highest baseline trust, because the pools are small and rotate naturally as phones move around.

On the very first request of a session — before any JavaScript or fingerprint data exists — IP reputation is the single biggest input to the score.

2. JavaScript fingerprinting and challenges

Plain HTTP clients (requests, axios, curl) just fetch pages; they don't run JavaScript. Cloudflare exploits this with a JS challenge — a script that must compute a token from values scattered around the page. No JavaScript engine, no token, no entry.

Headless browsers (real browsers driven by code, with no visible window) do run JavaScript, but their environment differs from a normal Chrome in dozens of small ways: the navigator.webdriver flag, missing plugins, the shape of window.chrome, canvas and WebGL outputs (how the browser draws graphics), font enumeration, timezone and locale mismatches, even the order in which the permissions APIs respond. Cloudflare hashes all of that into a fingerprint and compares it against known-automation patterns. Cloudflare Turnstile is the part of this pipeline the user actually sees.

3. HTTP and TLS fingerprinting

Before a single line of HTML is exchanged, Cloudflare can already fingerprint you from the TLS handshake (TLS is the encryption layer behind https; the handshake is the setup conversation that starts every connection, identified by JA3/JA4) and from how your client speaks HTTP/2.

Most scraping libraries still default to HTTP/1.1. Real Chrome and Firefox stopped doing that years ago.
libcurl and Go's net/http produce JA3 signatures that don't match any real browser, even when they do negotiate HTTP/2.
HTTP/2 fingerprinting digs deeper still: the order of pseudo-headers, the SETTINGS frame values, and window-update sizes all leak which client you really are.

So a User-Agent: Chrome header on a Python requests call is contradicted by the TLS handshake long before anyone reads the headers — the disguise is blown at the door.

4. Behavioural and pattern analysis

Cloudflare logs every connection, so your behaviour over time is just as visible as any single request:

Missing headers a real browser always sends (Sec-Fetch-*, Accept-Language, sec-ch-ua).
Payloads sent in the wrong order or encoding.
Cookies from the previous response that the next request fails to echo back.
Hits on URLs no human ever visits — honeypot links hidden in the page's DOM specifically to catch crawlers that blindly follow every link.
Bursty timing: 200 requests in 5 seconds, then silence.

All of this feeds Cloudflare's ML pattern analysis, which can flag a whole session even when each individual request looks fine on its own.

What this means for developers

The key takeaway from the four-signal model is that fixing one layer rarely moves the score. A residential proxy sitting on top of a fingerprint that screams HeadlessChrome will still fail; so will a fully patched browser running on a flagged AWS IP. The tooling generally falls into three buckets:

HTTP clients with browser-impersonating TLS — curl_cffi, curl-impersonate, tls-client. These match the TLS/HTTP/2 layer but can't run JS challenges.
Patched browsers — Playwright with fingerprint-consistency plugins, patchright, Camoufox. These cover JS execution and the fingerprint surface but cost a lot per request.
Managed scraping APIs — services that combine the two and handle proxy rotation and session continuity behind a single endpoint.

Reusing the same session value across requests keeps your cookies and trust score warm. Spinning up a fresh session for every request looks far more scripted than one that browses steadily for a few minutes.

Sites commonly fronted by Cloudflare

Cloudflare is the most widely deployed WAF on the web. Frequently studied targets span major retail, jobs, review, listings, and logistics sites. Many large sites rotate between Cloudflare, Akamai, DataDome and PerimeterX depending on traffic, so the detection logic you hit is rarely the same from day to day.

Summary

Cloudflare never makes a flat yes/no call. It blends four things into one continuous bot score: IP reputation, JavaScript execution and fingerprint, TLS/HTTP/2 handshake characteristics, and behavioural patterns over time. Any one of the four can pull the score below the WAF's threshold and get you blocked. And because the detection model keeps evolving, anything that relies on beating a single signal will eventually break.

Related terms

What Is Camoufox?

Camoufox is a fork of Firefox with anti-fingerprinting patches applied at the C++ build level. That phrase matters: most anti-fingerprinting…

What is Puppeteer? (Complete Guide 2026)

Puppeteer is a Node.js tool that lets your code drive a real Chrome browser automatically — clicking, typing, and reading pages just like a …

How to handle CAPTCHA in web scraping? (2026 Solutions)

A CAPTCHA is a test a website shows to tell humans apart from bots (the name stands for a "completely automated test to tell computers and h…

How PerimeterX (HUMAN) Works (2026)

PerimeterX, now branded as HUMAN Security, is one of the more elaborate anti-bot WAFs (Web Application Firewalls - security layers that sit …

What Is Residential Proxy Detection?

Residential proxy detection is how anti-bot systems spot traffic that is being routed through a residential proxy pool — a network of IP add…

How Imperva (Incapsula) Works (2026)

Imperva is a security service that filters traffic before it reaches a website, blocking what it thinks are bots and scrapers. It was histor…

Web Scraping vs API: Which Should You Choose? (2026 Comparison)

Web Scraping and APIs are the two main ways to pull data off a website. An API hands you clean, ready-to-use data the site officially provid…

Concept map

How How Cloudflare Works (2026) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Automation

Frequently asked questions

Why does Cloudflare block my HTTP client but not my browser?

Your client's TLS/JA3 and HTTP/2 fingerprints don't match a real browser. Cloudflare reads those during the connection setup, before any HTML is sent, so only a genuine Chrome-style handshake gets through.

What is the cf_clearance cookie?

It's the token Cloudflare hands you once you pass a challenge — proof that you cleared the check. Reusing it from the same IP and fingerprint keeps you cleared; sending it from a different IP is a red flag.

Is a residential proxy alone enough to pass Cloudflare?

No. IP reputation is only one of the four signals. You also need a browser-matching TLS fingerprint and consistent headers, or Cloudflare's bot score will still flag the session.

Last updated: 2026-05-31