HTTP Errors

What Is a 404 Not Found Error?

What Is a 404 Not Found Error? — conceptual illustration
On this page

HTTP 404 Not Found is the server's way of saying "I understood your request, but there is nothing at this address." The server is working fine - it just has no page, file, or data at the URL you asked for. On the normal web a 404 is straightforward: the page is gone or never existed. In scraping it is trickier: some anti-bot systems (tools that detect and block automated traffic) send a fake 404 to hide the fact they are blocking you, and JavaScript-heavy sites can show a 404-looking page that is actually fine once the browser runs its scripts.

Quick facts

Status family4xx — client error
Honest meaningURL does not exist on this server
Suspicious meaningAnti-bot system returning 404 instead of 403 to obscure the block
Retry safe?Usually no — but worth trying with a different IP or fingerprint if you suspect cloaking
Detection trickCompare response from a browser vs your scraper; if browser works, it is a block

When 404 is honest

Most 404s are real: a typo in the URL, a product that has been delisted, an old article taken down, or a path that never existed. When this happens, record the 404, mark the URL as dead in your work queue, and move on. Repeatedly hitting a dead URL just wastes requests and pushes the target site's rate limiter (the system that throttles clients sending too many requests) to flag your IP.

When 404 is a block

Some anti-bot stacks deliberately return 404 to scrapers instead of 403, on the theory that "page not found" is less useful to you than "you are blocked" - it gives you less to react to. Cloudflare, DataDome, and a handful of in-house systems do this. The giveaway: the page loads fine in a real browser on your machine but consistently 404s from your scraper. The fix is the same as for any block - a cleaner IP reputation, a more realistic browser fingerprint (the set of signals that make your traffic look like a normal browser), and a slower request rate.

When 404 is a rendering problem

Single-page apps (sites that load one HTML page and then build every view with JavaScript) often serve the same 404-shaped HTML shell for every URL, with the real content filled in by the browser after a follow-up fetch. If you scrape the raw HTML you see "404" or an empty body; if you actually run the JavaScript, the page loads normally. The clue is a mismatched content-type or a near-empty response body - switch to a JS-rendering API (one that runs the page's scripts for you) or grab the underlying XHR endpoint (the background data request the page makes) directly.

Code example

python
import requests

def diagnose_404(url):
    # Real-browser UA succeeds where bare client 404s → cloaked block
    headers = {'User-Agent': 'Mozilla/5.0 (real browser UA)'}
    r1 = requests.get(url, headers=headers)
    r2 = requests.get(url)
    if r1.status_code == 200 and r2.status_code == 404:
        return 'cloaked_block'
    if r1.status_code == 404 and r2.status_code == 404:
        return 'real_404'
    return 'inconclusive'

Related terms

Concept map

How 404 Error connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · HTTP Errors
Building map…

Frequently asked questions

Should I retry 404s in a crawl?

Usually no - mark the URL dead and move on. It is worth one retry through a different IP or with a real browser fingerprint if you suspect the site is disguising blocks as 404s.

Why would a site return 404 instead of 403?

To hide that they are blocking you. A 403 tells the scraper "you are detected, try harder." A 404 tells it "nothing here, give up." It is a deliberate tactic, not a bug.

How do I crawl an SPA that returns 404 for the raw HTML?

Either render the JavaScript (with Playwright or a JS-rendering scraping API) or figure out the XHR endpoint the SPA calls to load its data and request that directly - usually cheaper and faster than full rendering.

Last updated: 2026-05-31