What 520 actually indicates
520 is Cloudflare's catch-all when it cannot map an origin failure to a more specific 5xx. The origin replied, but the reply violated HTTP in some way: oversized headers, a connection reset before the response body finished, malformed status line, empty body where a length was promised. It is not a block aimed at your scraper — Cloudflare returns 520 to all clients when the origin misbehaves.
How to handle 520 in a scraper
Retry with exponential backoff (1s, 2s, 4s, 8s, max 5 attempts). Most 520s clear within a minute as the origin recovers. If a specific URL returns 520 consistently for an hour, the origin is likely broken — log it, alert, and move on rather than wasting crawl budget. Do not rotate proxies on a 520; the issue is server-side, so a new IP changes nothing.
When 520 hides a block
Rare but real: a few sites configure their origin firewall to drop scraper traffic at the TCP level, which surfaces to Cloudflare as a 520 instead of a proper 403. The tell is that 520s correlate with your traffic specifically — a real browser request from the same IP succeeds. In that case treat it as a soft block: improve TLS fingerprint and headers, then retry.
