Read Retry-After first, then back off with jitter
Always parse the Retry-After header before doing anything else, because the server is telling you exactly how long to wait. Per the HTTP spec, the value is either a non-negative integer count of seconds (Retry-After: 120) or an HTTP date in IMF-fixdate format (Retry-After: Wed, 21 Apr 2026 07:28:00 GMT), so your code must handle both. In Python, email.utils.parsedate_to_datetime parses the date form, and a simple .isdigit() check catches the seconds form. Sleep for that duration, then retry.
When the header is absent, fall back to exponential backoff: wait a base delay, then double it each attempt (1s, 2s, 4s, 8s), capping at a few minutes. The critical addition is jitter - a small random offset added to each delay. Without it, many clients (or many workers in one scraper) that all got a 429 at the same instant will all retry at the same instant, re-creating the burst that triggered the limit in the first place. This is the classic "thundering herd" problem. Adding random.uniform(0, base) (full jitter) desynchronizes retries so they spread out across the window. Cap the total number of attempts so a hard block does not loop forever.
Use urllib3 Retry or tenacity instead of hand-rolling
You rarely need to write the retry loop yourself. The requests library is built on urllib3, whose urllib3.util.Retry object handles 429 backoff at the adapter level. Mount it on a requests.Session via HTTPAdapter(max_retries=...) and set status_forcelist=[429, 500, 502, 503, 504], a backoff_factor (urllib3 sleeps backoff_factor * (2 ** (retry_number - 1)) seconds), and optionally backoff_jitter. Crucially, respect_retry_after_header defaults to True and RETRY_AFTER_STATUS_CODES includes 413, 429, and 503, so urllib3 honors Retry-After automatically when 429 is in your force list. Note that by default allowed_methods only retries idempotent verbs (GET, HEAD, PUT, etc.), so add POST explicitly if you intend to retry it.
For finer control - retrying on custom conditions, async code, or non-HTTP calls - the tenacity library gives you a clean decorator API: @retry(wait=wait_exponential_jitter(), stop=stop_after_attempt(5), retry=retry_if_result(...)). Both approaches beat a hand-written while loop, which is easy to get subtly wrong (forgetting jitter, not capping attempts, retrying non-idempotent writes). Pick urllib3 when you just want resilient requests calls, and tenacity when your retry predicate is more complex.
Cap concurrency, throttle per domain, and rotate IPs
Backoff alone is reactive; the real fix is sending fewer requests per IP in the first place. Three controls work together. First, limit concurrency: instead of launching unbounded threads or coroutines, gate them with a fixed-size pool - a concurrent.futures.ThreadPoolExecutor(max_workers=N), or in asyncio an asyncio.Semaphore(N) - so you never have more than N requests in flight at once. Second, throttle per domain: track the timestamp of the last request to each host and enforce a minimum gap (for example 0.5-2 seconds), since one global rate is too coarse when you crawl many sites with different limits. Scrapy implements exactly this with DOWNLOAD_DELAY and AUTOTHROTTLE_ENABLED, which automatically adjusts delay based on observed latency and 429s.
Third, rotate IPs to spread load horizontally. A single IP has one rate-limit budget; routing through a pool of proxies means each IP makes only a fraction of the requests and stays under the threshold - but keep each IP's pace human-like rather than treating rotation as license to fire faster. If you would rather not operate the proxy pool, backoff scheduler, and per-domain throttle yourself, a managed web-data API such as Scrappey handles proxy rotation, request pacing, and retries behind a single endpoint, so your code receives the parsed response instead of the 429.
