Network signals (the first filter)
Before any JavaScript runs, the site already knows your IP's ASN, reputation history, and geographic plausibility. Datacenter IPs (AWS, GCP, DigitalOcean) get near-zero trust by default. Residential and mobile IPs start neutral. Repeat-offender IPs are blacklisted at the edge. This filter alone handles ~70% of low-effort scraping traffic — no fingerprinting needed.
Transport signals (TLS and HTTP/2)
Every TLS handshake exposes a JA3/JA4 fingerprint — cipher suites, extensions, elliptic curves, in the exact order your client advertises them. Python's requests library has a JA3 that screams "not a browser." HTTP/2 adds frame priorities and header ordering as additional signals. Real Chrome sends headers in a specific order; curl sends them differently. Anti-bot vendors maintain catalogs of known automation-tool fingerprints and block on match.
Browser signals (JS-collected)
If you survive the network and transport filters, the page runs JavaScript that probes your browser environment: canvas rendering deterministic hash, WebGL renderer string, audio context fingerprint, installed fonts, screen geometry, timezone, languages, navigator.webdriver flag, and dozens more. Each is cheap to spoof in isolation; making them mutually consistent is the hard problem. A spoofed canvas + real WebGL is a stronger signal than either alone.
Behavioral signals (the last layer)
Once the page is loaded, the site records mouse movement, scroll patterns, dwell time before clicks, and form-fill cadence. Real users move in jittery non-linear arcs, scroll in bursts, and pause unpredictably. Scrapers either skip these interactions entirely (no mouse events ever fire) or emulate them in patterns that ML models classify with high confidence. This layer is what catches headless browsers that pass every static fingerprint check.
A worked example — what a single request reveals
Consider one GET against an Akamai-protected site from a vanilla Python requests script:
| Layer | What's observed | Verdict |
|---|---|---|
| Network | JA4 hash matches Python urllib3, not Chrome | Bot |
| Transport | No HTTP/2 — connection negotiates HTTP/1.1 | Bot |
| Headers | Accept-Encoding: gzip, no Accept-Language, User-Agent claims Chrome | Incoherent — bot |
| IP | AWS us-east-1 datacenter ASN | Bot |
| JavaScript | No script execution — sensor.js never ran | Bot or non-browser |
Each layer independently classifies this as bot. Akamai returns 412 with the Pardon Our Interruption body, the _abck cookie stays at ~-1~, and protected XHR endpoints block on the cookie state. The bot was identified at the TLS handshake — every layer below confirmed it.
Now repeat with curl_cffi + Chrome impersonation + ISP residential proxy: JA4 matches, HTTP/2 works, headers are coherent, IP is residential. The same endpoint returns 200. Nothing changed except the network-layer fingerprint.
How this is shifting in 2026
Three trends changing the detection model:
- JA4 has fully replaced JA3 across major vendors. Targeting JA3-only profiles produces a "wrong-shape Chrome" signal because vendors check both.
curl_cffi,utls, andtls-clientall support JA4 — there is no reason to be on JA3 in 2026. - WASM challenges are universal at enterprise tier. DataDome's
boring_challengeshipped in 2023; Akamai and PerimeterX added WASM probes through 2024. Defeating them at the JS layer is no longer possible (see the WASM fingerprinting entry); the bypass moved into the browser-engine layer (Camoufox, CloakBrowser). - Behavioural signals are per-session, not per-request. Vendors aggregate clicks, scrolls, and timing across a session and score the trajectory. Single-request perfect fingerprints can still be flagged behaviourally on request 50. The mitigation is realistic pacing and warm-up, not perfect single-request fingerprints.
What hasn't changed: the relative cost ranking. Network-layer fixes are still the cheapest, behavioural fixes still the most expensive. Climb the layers only as the previous one stops working.
