Why 403s happen in scraping
How to diagnose a 403
cf-ray, x-amzn-waf-action, server: AkamaiGHost, or x-datadome to see which vendor it is. Then check what you're sending: is the User-Agent realistic? Is Accept-Language present? Does your TLS fingerprint match the browser you claim to be? Finally, try the same URL through a residential proxy in the target's main country — if that works, the block was based on your IP; if it still fails, the block is based on your fingerprint.How to recover from a 403
Fixing a 403 in Python requests
Work through four layers in order — most 403s are solved by the first two:
- Send a realistic
User-Agent. The defaultpython-requests/2.xstring is the single most common cause of a 403 that works fine in a browser. - Send the full browser header set —
Accept,Accept-Language,Accept-Encoding, andReferer. A request with only aUser-Agentstill looks nothing like a real browser. - Persist cookies with a
requests.Session(). Many sites set a cookie on first load and 403 any request that arrives without it. - If 403s survive all of the above, the site is fingerprinting your TLS/HTTP2 handshake (typical of Cloudflare and DataDome). Plain
requestscan't change that — switch to curl_cffi withimpersonate="chrome", a headless browser, or a managed scraping API.
The code example below shows the headers-plus-session approach and the curl_cffi fallback side by side.
