Web Automation

How Cloudflare detects bots and scrapers (2026)

How Cloudflare detects bots and scrapers (2026) — conceptual illustration
On this page

Cloudflare's Bot Management sits in front of roughly 20% of the public web — including Stockx, Indeed, G2, Glassdoor, Instacart, Kickstarter and Zoopla — and it's the WAF most developers encounter first when they start working with HTTP clients and headless browsers.

This is a reference on what Cloudflare actually measures, how its scoring pipeline is structured, and what each detection layer means for someone building automation. It is not a how-to.

Quick facts

Coverage~20% of the public web
Key signalsTLS/JA3, HTTP/2 fingerprint, ML bot score
Clearancecf_clearance cookie
ChallengeTurnstile / managed challenge
Best approachReal-browser execution + residential IPs

What Cloudflare Bot Management is

Cloudflare runs as a reverse proxy at the CDN edge. Every request to a protected origin is intercepted and scored against a single global ML model trained on roughly 20% of all internet traffic. The scorer returns a bot score from 1–99 (1 = bot, 99 = human) in a few milliseconds, and the site's WAF rules decide what to do with it — pass, JavaScript challenge, managed challenge, or block.

Failed requests typically surface as:

  • error 1020 — access rule violation.
  • error 1015 — rate limiting.
  • A managed challenge interstitial (Turnstile).
  • A silent 403 with a cf-ray header.

The scorer doesn't distinguish intent. From its perspective, a price-comparison crawler and a credential-stuffing bot look the same; it only sees signals.

The four signal categories

1. IP address reputation

Cloudflare maintains an ASN-keyed reputation database, populated by traffic it has already seen across the network.

  • Datacenter IPs (AWS, GCP, Azure, DigitalOcean, OVH…) — pre-scored low. A request from a known cloud range starts with a poor score before any other check runs.
  • Residential IPs — assigned by ISPs to home connections, treated as much higher trust.
  • Mobile IPs — assigned to cell towers and carrier CGNAT pools. The highest baseline trust, because pools are small and rotate naturally.

IP reputation is the single largest input to the score on the first request of a session, before any JavaScript or fingerprint data exists.

2. JavaScript fingerprinting and challenges

Plain HTTP clients (requests, axios, curl) don't execute JavaScript, so Cloudflare can isolate them with a JS challenge — a script that computes a token from values scattered around the page. No JS engine, no token.

Headless browsers do run JS, but their environment differs from a real Chrome in dozens of small ways: navigator.webdriver, missing plugins, the shape of window.chrome, canvas and WebGL outputs, font enumeration, timezone and locale mismatches, the order in which permissions APIs respond. Cloudflare hashes those into a fingerprint and compares it against known-automation patterns. Cloudflare Turnstile is the user-visible end of this pipeline.

3. HTTP and TLS fingerprinting

Before any HTML is exchanged, Cloudflare fingerprints the client from the TLS handshake (JA3/JA4) and from how it speaks HTTP/2.

  • Most scraping libraries still default to HTTP/1.1. Real Chrome and Firefox haven't in years.
  • libcurl and Go's net/http produce JA3 signatures that don't match any real browser, even when they negotiate HTTP/2.
  • HTTP/2 fingerprinting goes further: the order of pseudo-headers, SETTINGS frame values, and window-update sizes all leak which client you actually are.

A User-Agent: Chrome header on a Python requests call is contradicted by the TLS handshake long before the headers are read.

4. Behavioural and pattern analysis

Cloudflare logs every connection, so behaviour over time is just as visible as any single request:

  • Missing headers a real browser always sends (Sec-Fetch-*, Accept-Language, sec-ch-ua).
  • Payloads sent in the wrong order or encoding.
  • Cookies from the previous response that the next request fails to echo back.
  • Hits on URLs no human ever visits — honeypot links hidden in the DOM specifically to catch link-following crawlers.
  • Bursty timing: 200 requests in 5 seconds, then silence.

These feed Cloudflare's ML pattern analysis, which can flag a session even when each individual request looks fine on its own.

What this means for developers

The practical implication of the four-signal model is that fixing one layer rarely improves the score. A residential proxy on top of a fingerprint that screams HeadlessChrome will still fail; a fully stealth-patched browser on a flagged AWS IP will too. Tooling generally falls into three categories:

  • HTTP clients with browser-impersonating TLScurl_cffi, curl-impersonate, tls-client. Match the TLS/HTTP/2 layer but cannot execute JS challenges.
  • Stealth-patched browsersPlaywright + stealth plugins, patchright, Camoufox. Cover JS execution and fingerprint surface but are expensive per request.
  • Managed scraping APIs — services like Scrappey that combine the two and handle proxy rotation, session continuity, and challenge solving behind one endpoint.

A minimal request through a managed API looks like this:

import requests

response = requests.post(
    'https://publisher.scrappey.com/api/v1',
    json={
        'cmd': 'request.get',
        'url': 'https://example.com/product/123',
        'session': 'cf-session-1'
    },
    headers={'Authorization': 'Bearer YOUR_API_KEY'}
)
print(response.json()['solution']['response'])

Reusing the session value across requests keeps cookies and the trust score warm — burning a fresh session per request looks far more scripted than one that browses for a few minutes.

Sites commonly fronted by Cloudflare

Cloudflare is the most widely deployed WAF on the web. Frequently studied targets include Stockx.com, Indeed.com, G2.com, Glassdoor.com, Instacart.com, Kickstarter.com, Zoopla.co.uk and Hapag-lloyd.com. Many large sites rotate between Cloudflare, Akamai, DataDome and PerimeterX depending on traffic, so detection logic is rarely static.

Summary

Cloudflare doesn't make a single yes/no decision. It produces a continuous bot score from IP reputation, JS execution and fingerprint, TLS/HTTP/2 handshake characteristics, and behavioural patterns over time. Any of the four can drag the score below the WAF's threshold. The detection model evolves continuously — anything brittle to a single signal will eventually break.

Related terms

Concept map

How How Cloudflare detects bots and scrapers (2026) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Automation
Building map…

Frequently asked questions

Why does Cloudflare block my HTTP client but not my browser?

Your client's TLS/JA3 and HTTP/2 fingerprints do not match a real browser. Cloudflare reads these before any HTML is exchanged, so a real Chrome handshake is what gets through.

What is the cf_clearance cookie?

It is the token Cloudflare issues once you pass a challenge. Reusing it across requests from the same IP and fingerprint keeps you cleared; sending it from a different IP is a flag.

Does a residential proxy alone defeat Cloudflare?

No. IP reputation is only one signal. You also need a browser-matching TLS fingerprint and consistent headers, or Cloudflare's bot score still flags the session.

Last updated: 2026-05-28