What gets measured
Mouse movement. A human hand moves in smooth, slightly wobbly arcs (Bezier curves with random velocity). It slows down as it nears the target — this is Fitts's Law, the rule that the closer you get to something small, the more carefully you aim — usually overshoots a little, then corrects. A scraper that jumps straight to a point with page.mouse.move(x, y) draws a perfectly straight line, which is statistically impossible for a real hand.
Timing patterns. How long between the page loading and your first action? How does your scrolling speed up and slow down? How evenly spaced are your keystrokes? How long do you stay on a page? Machine-learning models (software trained on examples to spot patterns) trained on millions of sessions detect this at sub-millisecond precision — now even finer thanks to WASM shared-buffer timers.
Session shape. Do you load the images and fonts a browser normally would? Do you visit the homepage first, or jump straight to a deep URL? Real users hesitate and load CSS and tracking pixels; bots and plain HTTP scrapers usually do not.
Biometric micro-signals. The faint tremor in a human's mouse path. Click pressure on touch devices. The rhythm of switching between mouse and keyboard. These are increasingly part of premium behavioural models.
Why it catches "perfect" scrapers
A scraper can have a Chrome 148 JA4 fingerprint, a home-broadband (residential) IP, a genuine canvas hash, a perfectly matched timezone — and still fail behavioural scoring. The four identity layers all say "this is a real Chrome user." The behaviour layer replies: "this real Chrome user moves the mouse like nobody who has ever touched a computer."
That gap is what makes behaviour so hard to fake. Identity signals can be configured ahead of time (Camoufox C++ patches) or at request time (curl_cffi TLS). Behaviour is different: it cannot be configured statically because it has to be generated as the session runs, and accurately modelling how humans move and type is much harder than it looks. Tooling that approximated human input has historically been re-characterised within months by ML models retrained on the newer patterns.
Inputs that behavioural models weigh
When working with sites you own or are authorized to automate, the inputs behavioural models weigh most heavily are well documented:
- Input generation. Tools such as Botasaurus with Humancursor (Bezier curves with random jitter and Fitts's Law deceleration), or Camoufox's
humanize=True, generate pointer and scroll input that falls inside the statistical range of ordinary human input rather than the perfectly straight lines a naive script produces. - Navigation path. Models score the shape of a whole session — landing on a homepage, dwelling, scrolling, and following internal links produces a different signal than jumping straight to a deep URL. This is one reason behavioural scoring looks across multiple requests.
- Timing distribution. A constant pause such as
time.sleep(2)is itself a machine-like signal; varied timing likerandom.uniform(1.8, 4.3)sits closer to natural session timings.
The key takeaway: behavioural detection is probabilistic (a likelihood score, not a yes/no) and evaluates the session as a whole. At very high request rates from one IP, the session-level pattern stops resembling a single human regardless of input quality. Distributing authorized, device-like traffic across many home or mobile IPs — what a residential proxy pool provides — keeps each IP's rate within what a single real person could plausibly produce.
