Handle 429 Rate Limiting in Python

Pim · Scrappey Research

June 16, 2026 5 min read

Paste into ChatGPT, Claude, or any LLM

On this page

Handling HTTP 429 in Python means catching the "Too Many Requests" response, reading the Retry-After header, then retrying with exponential backoff plus jitter instead of hammering the server. A 429 is a rate limit: the server is telling you that you have sent more requests in a time window than it allows, and the Retry-After header (when present) tells you exactly how long to pause. A robust client honors that header first, falls back to backoff with randomized delay when it is missing, caps how many requests run at once (concurrency), throttles per domain, and spreads load across rotating IPs so no single address carries the whole request volume, while keeping the overall pace polite. Done right, 429s become routine flow control rather than fatal errors.

Status code	429 Too Many Requests (4xx)
Key header	Retry-After (seconds or HTTP date)
Core fix	Exponential backoff + jitter
Stdlib helper	urllib3.util.Retry (respect_retry_after_header=True)
Popular library	tenacity (decorator-based retries)

Read Retry-After first, then back off with jitter

Always parse the Retry-After header before doing anything else, because the server is telling you exactly how long to wait. Per the HTTP spec, the value is either a non-negative integer count of seconds (Retry-After: 120) or an HTTP date in IMF-fixdate format (Retry-After: Wed, 21 Apr 2026 07:28:00 GMT), so your code must handle both. In Python, email.utils.parsedate_to_datetime parses the date form, and a simple .isdigit() check catches the seconds form. Sleep for that duration, then retry.

When the header is absent, fall back to exponential backoff: wait a base delay, then double it each attempt (1s, 2s, 4s, 8s), capping at a few minutes. The critical addition is jitter - a small random offset added to each delay. Without it, many clients (or many workers in one scraper) that all got a 429 at the same instant will all retry at the same instant, re-creating the burst that triggered the limit in the first place. This is the classic "thundering herd" problem. Adding random.uniform(0, base) (full jitter) desynchronizes retries so they spread out across the window. Cap the total number of attempts so a hard block does not loop forever.

Use urllib3 Retry or tenacity instead of hand-rolling

You rarely need to write the retry loop yourself. The requests library is built on urllib3, whose urllib3.util.Retry object handles 429 backoff at the adapter level. Mount it on a requests.Session via HTTPAdapter(max_retries=...) and set status_forcelist=[429, 500, 502, 503, 504], a backoff_factor (urllib3 sleeps backoff_factor * (2 ** (retry_number - 1)) seconds), and optionally backoff_jitter. Crucially, respect_retry_after_header defaults to True and RETRY_AFTER_STATUS_CODES includes 413, 429, and 503, so urllib3 honors Retry-After automatically when 429 is in your force list. Note that by default allowed_methods only retries idempotent verbs (GET, HEAD, PUT, etc.), so add POST explicitly if you intend to retry it.

For finer control - retrying on custom conditions, async code, or non-HTTP calls - the tenacity library gives you a clean decorator API: @retry(wait=wait_exponential_jitter(), stop=stop_after_attempt(5), retry=retry_if_result(...)). Both approaches beat a hand-written while loop, which is easy to get subtly wrong (forgetting jitter, not capping attempts, retrying non-idempotent writes). Pick urllib3 when you just want resilient requests calls, and tenacity when your retry predicate is more complex.

Cap concurrency, throttle per domain, and rotate IPs

Backoff alone is reactive; the real fix is sending fewer requests per IP in the first place. Three controls work together. First, limit concurrency: instead of launching unbounded threads or coroutines, gate them with a fixed-size pool - a concurrent.futures.ThreadPoolExecutor(max_workers=N), or in asyncio an asyncio.Semaphore(N) - so you never have more than N requests in flight at once. Second, throttle per domain: track the timestamp of the last request to each host and enforce a minimum gap (for example 0.5-2 seconds), since one global rate is too coarse when you crawl many sites with different limits. Scrapy implements exactly this with DOWNLOAD_DELAY and AUTOTHROTTLE_ENABLED, which automatically adjusts delay based on observed latency and 429s.

Third, rotate IPs to spread load horizontally. A single IP has one rate-limit budget; routing through a pool of proxies means each IP makes only a fraction of the requests and stays under the threshold - but keep each IP's pace human-like rather than treating rotation as license to fire faster. If you would rather not operate the proxy pool, backoff scheduler, and per-domain throttle yourself, a managed web-data API such as Scrappey handles proxy rotation, request pacing, and retries behind a single endpoint, so your code receives the parsed response instead of the 429.

Code example

python

import time, random, requests
from email.utils import parsedate_to_datetime
from datetime import datetime, timezone
from concurrent.futures import ThreadPoolExecutor

HEADERS = {  # a real browser UA; the default python-requests UA invites 429s
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/124.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}

def retry_after_seconds(resp):
    """Parse Retry-After whether it is seconds or an HTTP date."""
    ra = resp.headers.get("Retry-After")
    if not ra:
        return None
    if ra.isdigit():
        return int(ra)
    try:
        when = parsedate_to_datetime(ra)
        return max(0.0, (when - datetime.now(timezone.utc)).total_seconds())
    except (TypeError, ValueError):
        return None

def get_with_backoff(session, url, max_retries=5, base=1.0, cap=120.0):
    for attempt in range(max_retries):
        resp = session.get(url, timeout=30)
        if resp.status_code != 429:
            return resp
        wait = retry_after_seconds(resp)
        if wait is None:                      # no header -> exponential backoff
            wait = min(cap, base * (2 ** attempt))
        wait += random.uniform(0, base)       # full jitter, avoids herd retries
        print(f"429 on {url}; sleeping {wait:.1f}s (attempt {attempt + 1})")
        time.sleep(wait)
    raise RuntimeError(f"Still rate-limited after {max_retries} tries: {url}")

def crawl(urls, max_workers=4):
    # cap concurrency so we never exceed N requests in flight per IP
    session = requests.Session()
    session.headers.update(HEADERS)
    with ThreadPoolExecutor(max_workers=max_workers) as pool:
        return list(pool.map(lambda u: get_with_backoff(session, u), urls))

if __name__ == "__main__":
    pages = [f"https://example.com/api?page={i}" for i in range(1, 20)]
    for r in crawl(pages):
        print(r.status_code, len(r.text))

HTTP 429 Too Many Requests is the status code a server returns when a client has sent more requests in a given window than the server's rate…

What Is a Rotating Proxy?

A rotating proxy is a proxy service that automatically gives each request — or each new session — a different outbound IP address, picked fr…

What Is curl_cffi?

curl_cffi is a Python HTTP client whose TLS fingerprint looks exactly like real Chrome, Firefox, or Safari. TLS is the encryption layer behi…

What Is Proxy Web Scraping?

Proxy web scraping means sending your scraper's traffic through proxy servers — middleman machines that forward your requests for you — so t…

What Is Cloudflare Error 1015?

Cloudflare error 1015 "You are being rate limited" means a website is blocking you because you sent too many requests too quickly. The site …

Concept map

How Handle 429 Rate Limiting in Python connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · HTTP Errors

Tools & solutions for this topic

Frequently asked questions

Should I always trust the Retry-After header?

Yes, honor it whenever it is present, because the server is stating exactly how long it wants you to wait and ignoring it works against the server's stated limit and is the fastest way to make the rate limiting stricter. Parse both forms (an integer count of seconds and an HTTP date), and only fall back to exponential backoff when the header is missing entirely.

Why do I need jitter if I already have exponential backoff?

Exponential backoff without jitter is dangerous when multiple requests or multiple workers get a 429 at the same moment, because they will all compute the same delay and retry in lockstep, recreating the burst that caused the limit. Adding a random offset to each delay spreads retries across the window so they do not synchronize into a thundering herd.

Does rotating proxies fix 429 errors on its own?

It helps when the limit is per-IP, because spreading requests across many IPs keeps each one under the threshold, but it is not a license to send requests faster. If the limit is tied to your account, API key, or a connection fingerprint rather than the IP, rotating proxies alone will not help, so confirm which kind of limit you are hitting before adding infrastructure.

Should I use urllib3 Retry or the tenacity library?

Use urllib3.util.Retry mounted on a requests.Session when you just want resilient HTTP calls, since it honors Retry-After automatically and needs only a few lines. Reach for tenacity when your retry condition is more complex, you are writing async code, or you want to retry non-HTTP operations, because its decorator API handles those cases cleanly.

Last updated: 2026-06-16 · Facts last verified: 2026-06-16