Anti-Bot

What Is Anti-Bot Detection?

What Is Anti-Bot Detection? — conceptual illustration
On this page

Anti-bot detection is the set of techniques websites use to distinguish automated traffic from human users — and to block, challenge, or throttle the automated half. It combines IP reputation, browser fingerprinting, TLS analysis, behavioral signals, and machine-learning scoring to produce a risk score for every incoming request. Cloudflare, DataDome, PerimeterX, Akamai, and Imperva are the dominant vendors, and most large sites use at least one of them.

Quick facts

Common vendorsCloudflare, DataDome, PerimeterX (HUMAN), Akamai, Imperva
Signal categoriesIP, TLS, HTTP, browser fingerprint, behavior, history
Action takenAllow, challenge (CAPTCHA), throttle, block
Detection layerFour layers: network → JS → WASM → behavioural

The four layers of anti-bot detection

Modern bot-protection products score every request across four independent layers. Failing any one layer is usually enough to block; the scoring isn't a sum, it's a series of gates. Get every layer right or get nothing through.

LayerWhat's inspectedFires before…
1. NetworkTLS Client Hello (JA4), HTTP/2 SETTINGS frame, TCP options, IP reputation, ASNHTML is served
2. JavaScriptCanvas / WebGL / AudioContext fingerprints, navigator properties, Function.toString() inspection, extension probesXHR / API calls fire
3. WebAssemblyWASM SIMD CPU profile, SharedArrayBuffer timer precision, hyphenation dictionary checksChallenge token is issued
4. BehaviouralMouse movement Bezier curves, scroll cadence, keypress timing, click-to-event latencyScore is finalised over multiple requests

A scraper that uses curl_cffi (Layer 1 only) will pass against Layer 1-only vendors like older Imperva but fail against any deployment that loads sensor.js. A patched browser (Layers 1+2) will pass Akamai's static checks but fail DataDome's behavioural ML.

The five-vector coherence test

Beyond the four detection layers, vendors run a separate identity-coherence check across five vectors that must agree:

  1. IP — geolocation, ASN type (residential / datacenter / mobile)
  2. TimezoneIntl.DateTimeFormat().resolvedOptions().timeZone
  3. Accept-Language — HTTP header
  4. WebRTC — candidate IP exposed by STUN/TURN
  5. DNS — resolver used (matches ISP or VPN?)

An IP in São Paulo, a timezone of America/Sao_Paulo, an Accept-Language: pt-BR, a WebRTC candidate that matches the proxy, and a Brazilian ISP DNS resolver — that's a coherent fingerprint. A US datacenter IP with Tokyo timezone, English Accept-Language, and a WebRTC leak revealing the operator's home IP is the most common scraping signature and is trivially blocked. Proxy-rotation tools that touch only the IP fail this test every time.

What separates good detection from bad

Bad detection blocks on User-Agent regex ("deny anything with `Bot` in it") — easy to bypass, catches almost nothing real. Mediocre detection blocks on datacenter IP ranges and JA3 hashes of known scraping libraries — catches the lazy 80% of scrapers but misses anything with proxies and a real browser. Good detection is the major commercial vendors: they aggregate signals across thousands of customer sites, they update their models when new evasion techniques appear, and they correlate identities across requests so that solving one challenge doesn't earn you unlimited access — your fingerprint follows you. This is why bypassing modern anti-bot is a continuous arms race, not a one-time engineering task.

How scrapers stay ahead

Three things. First, use the same identity stack a real user does: residential or mobile IPs, a real (not headless-default) browser, fingerprints that exist in the wild. Second, behave like a user: realistic timing, cookies that persist across requests in a session, no perfectly-spaced request intervals. Third, accept that some sites will eventually win — when a target rolls out a new detection vendor, the right move is often to wait two weeks and let the bypass tooling catch up, rather than burning IPs and tokens trying to brute-force through. Managed scraping APIs absorb this churn for you; rolling your own means owning the catch-up cycle.

Related terms

What Is Cloudflare Turnstile?
Cloudflare Turnstile is a CAPTCHA-replacement service that verifies a visitor is a human without showing a traditional puzzle. It runs a ser…
What Is Browser Fingerprinting?
Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…
What Is a CAPTCHA Solver?
A CAPTCHA solver is software that automatically completes CAPTCHA challenges on behalf of an automated client. It receives the challenge fro…
What Is a Residential Proxy?
A residential proxy routes your HTTP traffic through a real residential internet connection — a home broadband or fiber line — instead of th…
Anti-Bot Vendor Detection Cheatsheet
The first step of any scrape against a protected site is identifying which anti-bot vendor is in front of it. The vendor determines almost e…
What Is TLS Fingerprinting (JA3/JA4)?
TLS fingerprinting is a technique that identifies an HTTP client from its TLS handshake — before the server reads a single request byte. The…
What Is Behavioural Bot Detection?
Behavioural bot detection is the layer of anti-bot scoring that asks "how does this client act?" rather than "what is it?". It tracks mouse-…
What Is a WebRTC IP Leak?
A WebRTC IP leak is the most-overlooked failure mode in browser-based scraping in 2026: WebRTC reveals your real local and public IP via STU…
What Is Anubis (Anti-AI-Scraper Firewall)?
Anubis is an open-source MIT-licensed reverse proxy that issues a SHA-256 proof-of-work challenge before serving HTTP requests, built specif…
What Is an Anti-Scraping Mechanism?
An anti-scraping mechanism is any technical control a website uses to detect, slow, or block automated requests. Modern sites stack multiple…
What Is Cloudflare Bot Management?
Cloudflare Bot Management is the enterprise-tier ML scoring system Cloudflare runs on every request to a protected zone. Unlike Turnstile — …
What Is Imperva Incapsula?
Imperva Incapsula is the enterprise WAF and bot-protection product from Imperva (acquired by Thales in 2023). It is heavily deployed across …
What Is AWS WAF Bot Control?
AWS WAF Bot Control is the managed rule group inside AWS WAF that classifies and blocks bot traffic. It ships in two tiers — Common (signatu…
What Is Forter?
Forter is an identity-and-trust platform used at e-commerce checkout, not a traditional anti-bot product. It scores transactions for fraud r…
What Is Riskified?
Riskified is a chargeback-guarantee platform for e-commerce checkout. Merchants pay Riskified a per-transaction fee and Riskified takes on t…
What Is WebGL Fingerprinting?
WebGL fingerprinting reads identifying information directly from the GPU. The browser exposes the graphics card vendor and renderer string (…
What Is AudioContext Fingerprinting?
AudioContext fingerprinting plays a silent waveform through the Web Audio API, then reads back the resulting floating-point samples and hash…
What Is Function.toString() Inspection?
Function.prototype.toString() inspection is the technique anti-bot scripts use to detect runtime JavaScript patches. Every JS function expos…
What Is the Scrapy + Go TLS Sidecar Architecture?
The Scrapy + Go TLS sidecar architecture is the most common production pattern for scraping Akamai- and Cloudflare-protected sites at scale.…
Web Scraping Tools 2026 — A Comparison
The web-scraping toolbox in 2026 is large but well-stratified. Each tool occupies one of seven roles — HTTP/TLS impersonation, browser autom…
What Is Scraper Data Poisoning?
Data poisoning is when a site detects a likely scraper and silently serves different data: fake prices, fabricated reviews, wrong stock coun…
What Is a DOM Honeypot?
A DOM honeypot is an invisible form field or link that humans never see but bots fill in or click. The moment you interact with it, the site…
What Is Fingerprint Lie Detection?
Fingerprint lie detection is the practice of verifying that the signals a browser reports are internally consistent and untampered, rather t…
What Is Browser Extension Detection?
Browser extension detection infers which extensions are installed by probing for the resources and side effects they expose to web pages. Ex…
What Is Fingerprint Clustering?
Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…
How to Build an Anti-Bot Challenge
An anti-bot challenge is a client-side test — proof-of-work, fingerprint collection, or a behavioural probe — that a server issues to separa…
What Is JA4 Fingerprinting?
JA4 is a TLS client fingerprint that replaced JA3 after Chrome began randomising the order of its TLS extensions. JA3 hashed the extension l…
What Is Residential Proxy Detection?
Residential proxy detection is the set of techniques anti-bot systems use to flag traffic that is being routed through a residential proxy p…
What Is Fingerprint Entropy?
Fingerprint entropy measures how much identifying information a browser attribute carries, expressed in bits. A signal that splits the popul…
Anti-Detect Browser Tools Compared
Anti-detect browser tools defeat bot detection by spoofing the signals that distinguish automation from a real user — but they work at very …
How Does Deobfuscation Work?
Deobfuscation is the process of turning deliberately unreadable code back into something a human can read and reason about. Obfuscators neve…
What are the 3 types of HTTP cookies? (2026 Guide)
What are the 3 types of HTTP cookies? (2026 Guide).…
What is a REST API? (Complete Guide 2026)
What is a REST API? (Complete Guide 2026).…
What is HTTP? (Complete Guide 2026)
What is HTTP? (Complete Guide 2026).…

Concept map

How Anti-Bot Detection connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

How do sites know I'm using a bot?

Usually a combination of signals — datacenter IP, headless-browser leaks, TLS fingerprint mismatch, unrealistic request timing — rather than any single thing. Modern systems score the whole picture, not individual signals.

Can I bypass anti-bot detection completely?

Not reliably and not forever. You can get high success rates against any specific vendor at any specific point in time with enough investment. But detection evolves, and a setup that works today may fail next month. Treat it as ongoing maintenance, not a solved problem.

Which anti-bot vendor is hardest to bypass?

DataDome and PerimeterX (HUMAN) tend to be the most aggressive on scraping. Cloudflare is everywhere and improving fast. Akamai is strong on financial and travel sites. The right answer depends on the specific target and changes month to month.

Does respecting robots.txt help me avoid detection?

Not for bot-detection purposes — anti-bot vendors don't check robots.txt. Respecting it is good ethics and reduces legal risk, but the detection layer fires based on technical signals regardless.

Why is a request blocked even though my fingerprint passes a fingerprint-test site?

Fingerprint-test sites usually only check Layer 2 (JavaScript). The block happened at Layer 1 (TLS / IP) or by the five-vector coherence test, neither of which the test site inspects. If you see a 403 before any JS runs, the failure is at Layer 1.

Do all vendors run all four layers?

No. Imperva and AWS WAF default to Layer 1 + light Layer 2. Akamai, Cloudflare Bot Management, PerimeterX run all four. DataDome leans heavily on Layer 4 behavioural ML. F5 Shape runs Layers 1 + 2 + a custom VM that defies easy categorisation. The vendor cheatsheet entry maps which layers each one weights heaviest.

Last updated: 2026-05-27