What Is Anti-Bot Detection?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is Anti-Bot Detection? — conceptual illustration

On this page

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challenge, or slow down the automated half. Instead of relying on one clue, it stacks several together: IP reputation (whether an address has a history of abuse), browser fingerprinting (identifying details your browser leaks), TLS analysis (TLS is the encryption layer behind https), behavioral signals like how you move and click, and machine-learning scoring. The result is a risk score attached to every request. Cloudflare, DataDome, PerimeterX, Akamai, and Imperva are the dominant vendors, and most large sites use at least one of them.

Common vendors	Cloudflare, DataDome, PerimeterX (HUMAN), Akamai, Imperva
Signal categories	IP, TLS, HTTP, browser fingerprint, behavior, history
Action taken	Allow, challenge (CAPTCHA), throttle, block
Detection layer	Four layers: network → JS → WASM → behavioural

The four layers of anti-bot detection

Modern bot-protection products check every request against four separate layers. Failing any one layer is usually enough to get blocked — the layers act like a row of gates, not a points total. You have to clear all of them or you get nothing through.

Layer	What's inspected	Fires before…
1. Network	TLS Client Hello (JA4), HTTP/2 SETTINGS frame, TCP options, IP reputation, ASN	HTML is served
2. JavaScript	Canvas / WebGL / AudioContext fingerprints, navigator properties, `Function.toString()` inspection, extension probes	XHR / API calls fire
3. WebAssembly	WASM SIMD CPU profile, SharedArrayBuffer timer precision, hyphenation dictionary checks	Challenge token is issued
4. Behavioural	Mouse movement Bezier curves, scroll cadence, keypress timing, click-to-event latency	Score is finalised over multiple requests

Each layer runs at a different moment. Layer 1 inspects the raw connection — including the TLS Client Hello, the first handshake message a browser sends, summarised as a JA4 fingerprint — before any HTML is even sent back. Layer 2 runs JavaScript in the page to probe the browser itself. Layer 3 leans on WebAssembly (compiled code that runs in the browser) for low-level CPU and timing checks. Layer 4 watches how you actually behave over several requests. So a scraper using curl_cffi (which only handles Layer 1) will pass against Layer 1-only vendors like older Imperva but fail against anything that loads sensor.js. A patched browser (Layers 1+2) will pass Akamai's static checks but fail DataDome's behavioural ML.

The five-vector coherence test

On top of the four detection layers, vendors run a separate identity-coherence check. The idea is simple: a real visitor's details should all tell the same story. These five vectors must agree:

IP — geolocation, ASN type (residential / datacenter / mobile)
Timezone — Intl.DateTimeFormat().resolvedOptions().timeZone
Accept-Language — HTTP header
WebRTC — candidate IP exposed by STUN/TURN
DNS — resolver used (matches ISP or VPN?)

Here is what coherent looks like: an IP in São Paulo, a timezone of America/Sao_Paulo, an Accept-Language: pt-BR, a WebRTC candidate that matches the proxy, and a Brazilian ISP DNS resolver — every signal points to the same person in the same place. Now the giveaway: a US datacenter IP with a Tokyo timezone, English Accept-Language, and a WebRTC leak that reveals the operator's real home IP. That mismatch is the most common scraping signature and is trivially blocked. Proxy-rotation tools that change only the IP fail this test every time, because they leave the other four vectors pointing elsewhere.

What separates good detection from bad

Bad detection blocks on a User-Agent regex ("deny anything with `Bot` in the name") — trivially circumvented and it catches almost nothing real. Mediocre detection blocks datacenter IP ranges and JA3 hashes (a fingerprint of the TLS handshake) of known scraping libraries — this catches the lazy 80% of scrapers but misses anything running a real browser behind proxies. Good detection is the major commercial vendors. They pool signals from thousands of customer sites, update their models as new automation patterns appear, and correlate identities across requests — so passing one challenge is not permanent, because the same fingerprint is recognised on later requests. That is why detection accuracy is a continuously evolving field, not a fixed system that stays the same over time.

What detection systems weigh most

Three factors dominate. First, identity consistency: detection compares the IP type (residential, mobile, or datacenter), the browser environment, and whether the fingerprint matches configurations that actually exist in the wild. Second, behavioural realism: detection scores request timing, session-level cookie continuity, and whether request gaps look human or mechanically even. Third, model freshness: when a site adopts a new detection vendor, its scoring changes, which is why detection accuracy is best understood as an evolving capability rather than a fixed rule set. For authorized data collection on sites you own or are permitted to access, managed scraping APIs absorb this ongoing change so you do not maintain the configuration yourself.

Related terms

What Is Cloudflare Turnstile?

Cloudflare Turnstile is a service that checks whether a visitor is a real human, but without showing the kind of puzzle a normal CAPTCHA doe…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

What Is a CAPTCHA Solver?

A CAPTCHA solver is software that automatically completes CAPTCHA challenges for an automated client. A CAPTCHA is the "prove you're human" …

What Is a Residential Proxy?

A residential proxy sends your web traffic through a real home internet connection — a regular broadband or fiber line — instead of through …

Anti-Bot Vendor Detection Cheatsheet

A useful first step when working with any protected site you are authorized to access is identifying which anti-bot vendor sits in front of …

What Is TLS Fingerprinting (JA3/JA4)?

TLS fingerprinting is a way to recognize what software made a connection just by looking at how it sets up encryption — before the server re…

What Is Behavioural Bot Detection?

Behavioural bot detection is the part of anti-bot scoring that asks "how does this client act?" instead of "what is this client?". Instead o…

What Is a WebRTC IP Leak?

A WebRTC IP leak is when your browser quietly reveals your real IP address — even though you set up a proxy to hide it. It is the most-overl…

What Is Anubis (Anti-AI-Scraper Firewall)?

Anubis is a free, open-source MIT-licensed "gatekeeper" that sits in front of a website (a reverse proxy - software that intercepts requests…

What Is an Anti-Scraping Mechanism?

An anti-scraping mechanism is any technical control a website uses to detect, slow down, or block automated requests (bots) instead of real …

What Is Cloudflare Bot Management?

Cloudflare Bot Management is the enterprise-tier ML scoring system Cloudflare runs on every request to a protected zone. In plain terms: it …

What Is Imperva Incapsula?

Imperva Incapsula is the enterprise WAF and bot-protection product from Imperva (acquired by Thales in 2023). A WAF (web application firewal…

What Is AWS WAF Bot Control?

AWS WAF Bot Control is a ready-made set of rules inside AWS WAF (Amazon's web application firewall — the security layer that filters t…

What Is Forter?

Forter is a fraud-and-trust platform that runs at e-commerce checkout — it is not a traditional anti-bot product. Instead of blocking scrape…

What Is Riskified?

Riskified is a chargeback-guarantee platform for e-commerce checkout. A chargeback is the money a merchant loses when a customer disputes a …

What Is WebGL Fingerprinting?

WebGL fingerprinting reads identifying information directly from the GPU. WebGL is the browser feature that lets web pages draw 3D graphics …

What Is AudioContext Fingerprinting?

AudioContext fingerprinting plays a silent waveform through the Web Audio API, then reads back the resulting floating-point samples and hash…

What Is Function.toString() Inspection?

Function.prototype.toString() inspection is a technique anti-bot scripts use to identify JavaScript functions that have been modified at run…

What Is the Scrapy + Go TLS Sidecar Architecture?

The Scrapy + Go TLS sidecar architecture is the most common production pattern for scraping Akamai- and Cloudflare-protected sites at scale.…

Web Scraping Tools 2026 — A Comparison

"Web scraping tools" is the whole family of software you use to pull data off websites — and in 2026 that family is big but neatly sorted in…

What Is Scraper Data Poisoning?

Data poisoning is when a site decides you are probably a scraper and quietly feeds you wrong data instead of blocking you: fake prices, made…

What Is a DOM Honeypot?

A DOM honeypot is an invisible form field or link that humans never see but bots fill in or click. The DOM (Document Object Model) is the li…

What Is Fingerprint Lie Detection?

Fingerprint lie detection is the practice of verifying that the signals a browser reports are internally consistent and untampered, rather t…

What Is Browser Extension Detection?

Browser extension detection infers which extensions are installed by probing for the resources and side effects they expose to web pages. Ex…

What Is Fingerprint Clustering?

Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…

How to Build an Anti-Bot Challenge

An anti-bot challenge is a small test a server makes your browser run — like proof-of-work (forcing the browser to burn some CPU on a puzzle…

What Is JA4 Fingerprinting?

JA4 is a way to identify a browser by the fingerprint of its TLS handshake — TLS being the encryption layer behind https. It replaced the ol…

What Is Residential Proxy Detection?

Residential proxy detection is how anti-bot systems spot traffic that is being routed through a residential proxy pool — a network of IP add…

What Is Fingerprint Entropy?

Fingerprint entropy is a way to measure how much a browser attribute gives away about who you are, counted in bits. Think of entropy as "how…

Anti-Detect Browser Tools Compared

Anti-detect browser tools aim to present a consistent, real-looking browser configuration so that automated sessions render the same fingerp…

How Does Deobfuscation Work?

Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators scra…

What are the 3 types of HTTP cookies? (2026 Guide)

An HTTP cookie is a small piece of data a website asks your browser to store and then send back on every later request to that site. Because…

What is a REST API? (Complete Guide 2026)

A REST API is a standard way for programs to read and change data over the web using ordinary HTTP requests. This is the complete 2026 guide…

What is HTTP? (Complete Guide 2026)

HTTP (HyperText Transfer Protocol) is the set of rules browsers and servers use to talk to each other on the web. Every time you load a page…

What Is WebGPU Fingerprinting?

WebGPU fingerprinting reads identifying data from the modern navigator.gpu API. WebGPU is the newest browser standard for talking to your GP…

What Is Client Hints Fingerprinting?

User-Agent Client Hints (UA-CH) are a set of structured HTTP headers plus a matching JavaScript API that report the same browser and operati…

What Is a Timezone / IP Mismatch?

A timezone/IP mismatch is when the location a browser claims and the location of its IP address disagree. Anti-bot systems (the software sit…

What Is navigator.webdriver?

navigator.webdriver is a standardized boolean that returns true when the browser is being controlled by automation. Think of it as a built-i…

What Is JA3 Fingerprinting?

JA3 is a method for fingerprinting a TLS client by hashing the fields of its Client Hello. TLS is the encryption layer behind https, and the…

What Is HTTP/3 / QUIC Fingerprinting?

HTTP/3 / QUIC fingerprinting identifies a client from the QUIC transport layer that HTTP/3 runs on. QUIC is the modern transport beneath HTT…

What Is Hardware Fingerprinting?

Hardware fingerprinting reads device capability signals - CPU cores, RAM, and screen metrics - that JavaScript exposes directly. These are v…

What Is CDP Detection?

CDP detection is the family of techniques anti-bot scripts use to tell that a browser is being driven through the Chrome DevTools Protocol (…

What Is Incognito Detection?

Incognito detection is the set of techniques that reveal whether a browser is in private / incognito mode. Private mode is the browser featu…

What Is Media Devices Fingerprinting?

Media devices fingerprinting reads the list of cameras, microphones, and speakers a browser reports via navigator.mediaDevices.enumerateDevi…

What Is Speech Synthesis Fingerprinting?

Speech synthesis fingerprinting reads the list of text-to-speech voices exposed by window.speechSynthesis.getVoices(). "Text-to-speech" mean…

What Is Stack Depth Fingerprinting?

Stack depth fingerprinting measures the maximum JavaScript recursion depth a browser allows before throwing a RangeError: Maximum call stack…

What Is CSS Media Query Fingerprinting?

CSS media query fingerprinting reads operating-system and device preferences through window.matchMedia(). A media query is a yes/no question…

What Is Screen Resolution Fingerprinting?

Screen resolution fingerprinting reads the display measurements a browser reports - screen.width/height, availWidth/availHeight, colorDepth,…

How Do You Devirtualize an Obfuscated JavaScript VM?

Devirtualization is the process of recovering a readable program from JavaScript that has been compiled into a tiny interpreter — a virtual …

What Is Cloudflare Error 1020 (Access Denied)?

Cloudflare Error 1020 "Access Denied" means a Cloudflare firewall (WAF) rule on the site has blocked your request outright. Unlike Error 101…

What Is a User Agent?

A user agent is a short text string a client sends in the User-Agent HTTP header to tell a server what software is making the request. Every…

What Is a CAPTCHA?

A CAPTCHA is a challenge a website uses to tell a human visitor apart from an automated script. The name stands for Completely Automated Pub…

What Is a Web Unblocker?

A web unblocker is a managed service that sits between your scraper and a target site, automatically handling the proxies, browser rendering…

curl_cffi vs requests in Python

curl_cffi and requests are both Python HTTP clients, but curl_cffi can impersonate a real browser's TLS and HTTP/2 fingerprint while request…

Fix 403 Forbidden When Scraping (Python)

To fix a 403 Forbidden while scraping a site you are permitted to access, make sure your HTTP client presents complete, consistent request m…

Concept map

How Anti-Bot Detection connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Anti-Bot

Tools & solutions for this topic

Frequently asked questions

How do sites know I'm using a bot?

Usually it's a combination of signals working together — a datacenter IP, headless-browser leaks, a TLS fingerprint that doesn't match a real browser, unrealistic request timing — rather than any single giveaway. Modern systems score the whole picture, not individual clues.

Is anti-bot detection ever perfectly accurate?

No detection system is flawless, and none stays static. Vendors improve their models continuously, so the signals a given configuration produces are scored differently over time. Detection is best understood as an evolving capability rather than a fixed, solved system.

Which anti-bot vendor is the most aggressive?

DataDome and PerimeterX (HUMAN) tend to score automated traffic most aggressively. Cloudflare is everywhere and improving fast. Akamai is strong on financial and travel sites. Which one a given site uses depends on the target and changes over time.

Does respecting robots.txt affect bot detection?

Not directly — anti-bot vendors don't read robots.txt. Respecting it is good practice and lowers your legal risk, but the detection layer scores technical signals regardless of what robots.txt says.

Why is a request blocked even though my fingerprint passes a fingerprint-test site?

Fingerprint-test sites usually only check Layer 2 (JavaScript). Your block happened at Layer 1 (TLS / IP) or in the five-vector coherence test — neither of which the test site inspects. A quick tell: if you get a 403 before any JavaScript runs, the failure is at Layer 1.

Do all vendors run all four layers?

No. Imperva and AWS WAF default to Layer 1 plus a light Layer 2. Akamai, Cloudflare Bot Management, and PerimeterX run all four. DataDome leans heavily on Layer 4 behavioural ML. F5 Shape runs Layers 1 + 2 plus a custom VM that defies easy categorisation. The vendor cheatsheet entry maps which layers each one weights heaviest.

Last updated: 2026-05-31