What gets measured
Mouse movement. Human movements follow Bezier curves with Gaussian-jittered velocity. They decelerate as they approach a target (Fitts's Law), overshoot slightly, then correct. Scrapers that click directly with page.mouse.move(x, y) produce linear trajectories that are statistically impossible for a human hand.
Timing patterns. Time between page load and first interaction. Scroll acceleration curves. Inter-keystroke variance. Navigation dwell time. ML models trained on millions of sessions detect at sub-millisecond precision (now even finer thanks to WASM shared-buffer timers).
Session shape. Do you load images and fonts? Do you visit the homepage first or land directly on a deep URL? Real users hesitate; bots do not. Real users load CSS and tracking pixels; HTTP scrapers usually do not.
Biometric micro-signals. Hand tremor in mouse paths. Click pressure on touch devices. The cadence with which a human alternates between mouse and keyboard. These are increasingly part of premium behavioural models.
Why it catches "perfect" scrapers
A scraper can have a Chrome 148 JA4, a residential ISP IP, a real canvas hash, perfect timezone alignment, and still fail behavioural scoring. The four identity layers say "this is a real Chrome user". The behaviour layer says "this real Chrome user moves the mouse like nobody who has ever used a computer".
The asymmetry is what makes behavioural so hard to bypass. You can patch identity at compile time (Camoufox C++ patches) or at request time (curl_cffi TLS). You cannot patch behaviour without modelling it. And modelling human input distributions accurately is much harder than it looks — every public stealth library that tried got beaten within months by ML models retrained on the new patterns.
What actually works
Three layered defences:
- Humanized mouse and scroll. Botasaurus + Humancursor (Bezier with Gaussian jitter, Fitts's Law deceleration). Camoufox
humanize=True. These produce trajectories within the human distribution rather than outside it. - Warm-up navigation. Before hitting your target page, visit the homepage. Wait 2–3 seconds. Scroll. Click an internal link. Then navigate. This single change improves behavioural scores significantly on DataDome and Akamai because their multi-request models reward consistent, human-like session shape.
- Randomized delays, not constant ones.
random.uniform(1.8, 4.3)beatstime.sleep(2)every time. Better still: model your delays after real session traces from the same target.
The honest limit: behavioural detection is probabilistic, and at very high request rates per IP, even a perfect humanization stack starts to fail because the session-level pattern stops looking human. The endgame is diversifying real-device traffic across many residential or mobile IPs — exactly what a residential proxy pool and the next generation of distributed-browser networks provide.
