HTTP Errors

Fix 403 Forbidden When Scraping (Python)

By the Scrappey Research Team

Fix 403 Forbidden When Scraping (Python) — conceptual illustration
On this page

To fix a 403 Forbidden while scraping a site you are permitted to access, make sure your HTTP client presents complete, consistent request metadata like a normal browser at every layer: send a genuine User-Agent and full header set, persist cookies with a session, match a browser's TLS handshake (for example with curl_cffi's impersonate), route through residential or rotating proxies, and slow your request rate. A 403 in scraping is almost never about credentials -- it means an anti-bot system decided your client looks automated. Fixing it is a process of elimination: check one signal at a time -- headers, then cookies, then the TLS fingerprint, then the IP -- to find which layer is causing the mismatch between your client and a normal browser request.

Quick facts

What 403 means hereServer refuses the request as automated, not unauthenticated
First thing to changeDefault python-requests User-Agent + add full header set
If headers don't fix itTLS/JA3 fingerprint mismatch -- use curl_cffi impersonate
If TLS doesn't fix itIP reputation -- rotate residential/mobile proxies
Never doRetry the same 403 unchanged -- it hardens the block

Diagnose why you are getting 403

Before changing anything, find out which signal is triggering the block -- the fix for a header problem is useless against an IP problem. Read the response body and headers first, because they usually name the cause.

  • Read the body. A short plain "Forbidden" or "Access Denied" page is typically a basic WAF (web application firewall, simple edge pattern-matching). A branded challenge page with a ray ID points to a dedicated anti-bot detection service.
  • Read the headers. Look for cf-ray (Cloudflare), x-amzn-waf-action (AWS WAF), server: AkamaiGHost (Akamai), or an x-datadome cookie. These tell you what you are up against.
  • Isolate the variable. Run the exact same URL three ways: with plain requests, with full browser headers, and through a proxy in the target's main country. If headers fix it, the cause was your header set. If only the proxy fixes it, the cause was IP reputation or geo. If nothing fixes it, the cause is your TLS fingerprint -- a signal that plain Python clients cannot change.

The 403 is only the symptom. The real decision was made one layer earlier, based on the cheapest signal the server could check, so identify that signal before you start editing code.

Fix it step by step (headers, sessions, TLS)

Work through these layers in order. Most 403s on permitted sites are solved by the first two; the rest need a real browser fingerprint.

  1. Send a real User-Agent. The default python-requests/2.x string is the single most common cause of a 403 that works fine in a browser. Copy a current Chrome or Firefox User-Agent and use it verbatim.
  2. Send the full browser header set. A request with only a User-Agent still looks nothing like a browser. Add Accept, Accept-Language, Accept-Encoding, and a plausible Referer. Browsers also send Client Hints (sec-ch-ua, sec-fetch-*); matching those helps on stricter sites.
  3. Persist cookies with a session. Many sites set a cookie on first load and 403 any follow-up request that arrives without it. A requests.Session() (or curl_cffi.requests.Session()) carries those cookies forward automatically.
  4. Match the TLS handshake. If 403s survive perfect headers and cookies, the server is fingerprinting your TLS/JA3 and HTTP/2 settings -- the handshake reveals "Python" before the server reads a single header. Plain requests cannot change this. Switch to curl_cffi with impersonate="chrome", which replays a real browser's TLS, JA3, and HTTP/2 frame order. Use impersonate="chrome124" to pin a specific version, or "chrome" to track the latest.

When the fix is the IP: proxies and pacing

If headers, cookies, and TLS impersonation all check out but you still see 403, the block is about where the request comes from and how fast the requests arrive.

  • IP reputation. Datacenter IP ranges (cloud servers, hosting providers) are widely flagged because real users do not browse from them. If the block is tied to datacenter IP ranges, testing the same request from a different network -- for example a residential proxy -- helps confirm whether IP reputation, rather than your client, was the cause. Pass proxies to curl_cffi as proxies={"https": "http://user:pass@host:port"}.
  • Geo restrictions. Some sites only serve specific countries. If the body mentions your region, choose a proxy located in the target's main market.
  • Rate and pattern. A burst of identical requests from one IP reads as automation. Add a delay between requests, randomize it slightly, and and keep per-IP request volume low and well-spaced so your traffic stays within a reasonable, polite rate for the site. Slowing down is often the cheapest fix of all.
  • Stop retrying blindly. Repeated 403s from the same identity reinforce the block. On a 403, change a signal (IP, headers, or fingerprint) before the next attempt; never loop on the same request.

Stitching real browser headers, TLS impersonation, a residential proxy pool, retries, and pacing together by hand is a lot of moving parts. A managed web-data API such as Scrappey rolls the browser fingerprint, proxy rotation, and retry logic into a single request, which is one way to consolidate all of the layers above when you would rather not maintain them yourself.

Code example

python
# Layered fix for a 403 on a site you are permitted to scrape.
# pip install requests curl_cffi

# --- Layers 1-3: real headers + a session that persists cookies ---
import requests

BROWSER_HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/124.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.google.com/",
    "Upgrade-Insecure-Requests": "1",
}

url = "https://example.com/data"
session = requests.Session()
session.headers.update(BROWSER_HEADERS)
resp = session.get(url, timeout=30)
print("requests:", resp.status_code)  # often 200 once headers + cookies look real

# --- Layer 4: still 403? Match the TLS handshake with curl_cffi ---
# Plain requests cannot change its TLS/JA3 fingerprint; curl_cffi can.
if resp.status_code == 403:
    from curl_cffi import requests as cffi

    # Optional: route through a residential proxy for IP-based 403s.
    proxies = {"https": "http://user:pass@residential-host:port"}

    s = cffi.Session(impersonate="chrome")  # replays a real Chrome TLS/HTTP2
    s.headers.update({"Accept-Language": "en-US,en;q=0.9"})
    resp = s.get(url, proxies=proxies, timeout=30)
    print("curl_cffi:", resp.status_code)

    # Slow down and rotate identity before any retry -- never loop on a 403.
    import time, random
    time.sleep(random.uniform(2, 5))

Related terms

Concept map

How Fix 403 Forbidden When Scraping (Python) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · HTTP Errors
Building map…

Frequently asked questions

Will changing the User-Agent alone fix a 403 when scraping?

Sometimes, but only against the most basic rules. The default python-requests User-Agent triggers many simple blocks, so swapping it for a real browser string can clear a 403 on lightly protected sites. Modern detection, though, cross-checks the User-Agent against your TLS fingerprint, header order, and cookies, so the User-Agent only helps when everything else lines up with it consistently.

How do I know if my 403 is caused by the IP rather than my headers?

Send the exact same request through a different IP, ideally a residential proxy in the target's main country. If the second attempt succeeds, the first IP was on a block list or geo-restricted. If it still returns 403, the cause is something the IP cannot explain -- usually your headers, cookies, or TLS fingerprint.

Why does curl_cffi work where Python requests gets a 403?

Plain requests cannot disguise its TLS handshake, so its JA3 fingerprint and HTTP/2 settings announce a Python client before the server reads any header. curl_cffi uses curl-impersonate to replay a real browser's TLS, JA3, and HTTP/2 frame order via the impersonate parameter, so the connection itself looks like Chrome or Firefox rather than a script.

Is it OK to keep retrying a request that returns 403?

No. Repeated 403s from the same identity reinforce the block and can escalate it to a longer ban. On a 403, change at least one signal -- the IP, the header set, or the TLS fingerprint -- before the next attempt, and add a delay between requests so your traffic does not read as an automated burst.

Last updated: 2026-06-16 · Facts last verified: 2026-06-16