What 520 actually indicates
520 is Cloudflare's catch-all: it uses this code when an origin failure does not match any of its more specific 5xx codes. The origin replied, but the reply violated HTTP in some way — oversized headers, a connection reset before the response body finished, a malformed status line, or an empty body where a length was promised. It is not a block aimed at your scraper. Cloudflare returns 520 to all clients when the origin misbehaves.
How to handle 520 in a scraper
Retry with exponential backoff — wait a little longer before each attempt (1s, 2s, 4s, 8s, max 5 attempts). Most 520s clear within a minute as the origin recovers. If a specific URL returns 520 consistently for an hour, the origin is likely broken — log it, alert, and move on rather than wasting crawl budget. Do not rotate proxies on a 520; the issue is server-side, so a new IP changes nothing.
When 520 hides a block
Rare but real: a few sites configure their origin firewall to drop scraper traffic at the TCP level (the connection layer, below HTTP), which reaches Cloudflare as a 520 instead of a proper 403. The tell is that the 520s correlate with your traffic specifically — a real browser request from the same IP succeeds. In that case treat it as a soft block: improve your TLS fingerprint (the signature of your client's encrypted-connection setup) and headers, then retry.
