The layered model
Real anti-scraping is not one product but a stack of checks, like a building with security at the gate, the lobby, and every floor. At the edge (the first thing your request hits): WAF rules (a Web Application Firewall, which filters traffic by pattern), rate limits, and ASN blocklists (an ASN identifies the network your IP belongs to, so a whole hosting provider can be blocked at once). One layer in: TLS fingerprint validation, header consistency checks, and HTTP/2 frame analysis — all looking for tells that you are software, not a browser. Inside the page: JavaScript challenges (a small puzzle the browser must solve, such as proof-of-work, plus fingerprint collection) and CAPTCHAs. After the page loads: behavioral analysis on your mouse, scroll, and timing. A request that passes all five layers is treated as human. A request that fails any one is scored down — and repeated failures escalate the next request to a harder challenge.
How vendors compose
Anti-scraping is usually bought, not built. Cloudflare and Akamai handle the edge layers and JS challenges as a managed product you simply switch on. DataDome and Kasada specialize in the JS-VM and behavioral layers (a JS-VM is a sandbox that runs obfuscated detection code in your browser). Shape Security (F5) builds custom JS virtual machines that re-obfuscate — scramble themselves — on every deployment, so each release looks new. Many sites stack two vendors: Cloudflare at the edge plus DataDome for bot management is a common pairing. Satisfying one layer does not satisfy the other — each vendor scores requests independently.
Matching response to the stack
For authorized data collection on sites you own or are permitted to access, the first question is not "how do I get through this?" but "is the data even worth the engineering effort?" A simple rate limit costs hours of work to handle correctly. A stacked Cloudflare + DataDome + behavioral ML (machine-learning) system can cost weeks of engineering plus a recurring proxy bill in the thousands per month. Managed scraping APIs spread that cost across all their customers, so above a certain volume they are usually cheaper than building and maintaining the same infrastructure in-house.
