The four layers of anti-bot detection
Modern bot-protection products score every request across four independent layers. Failing any one layer is usually enough to block; the scoring isn't a sum, it's a series of gates. Get every layer right or get nothing through.
| Layer | What's inspected | Fires before… |
|---|---|---|
| 1. Network | TLS Client Hello (JA4), HTTP/2 SETTINGS frame, TCP options, IP reputation, ASN | HTML is served |
| 2. JavaScript | Canvas / WebGL / AudioContext fingerprints, navigator properties, Function.toString() inspection, extension probes | XHR / API calls fire |
| 3. WebAssembly | WASM SIMD CPU profile, SharedArrayBuffer timer precision, hyphenation dictionary checks | Challenge token is issued |
| 4. Behavioural | Mouse movement Bezier curves, scroll cadence, keypress timing, click-to-event latency | Score is finalised over multiple requests |
A scraper that uses curl_cffi (Layer 1 only) will pass against Layer 1-only vendors like older Imperva but fail against any deployment that loads sensor.js. A patched browser (Layers 1+2) will pass Akamai's static checks but fail DataDome's behavioural ML.
The five-vector coherence test
Beyond the four detection layers, vendors run a separate identity-coherence check across five vectors that must agree:
- IP — geolocation, ASN type (residential / datacenter / mobile)
- Timezone —
Intl.DateTimeFormat().resolvedOptions().timeZone - Accept-Language — HTTP header
- WebRTC — candidate IP exposed by STUN/TURN
- DNS — resolver used (matches ISP or VPN?)
An IP in São Paulo, a timezone of America/Sao_Paulo, an Accept-Language: pt-BR, a WebRTC candidate that matches the proxy, and a Brazilian ISP DNS resolver — that's a coherent fingerprint. A US datacenter IP with Tokyo timezone, English Accept-Language, and a WebRTC leak revealing the operator's home IP is the most common scraping signature and is trivially blocked. Proxy-rotation tools that touch only the IP fail this test every time.
