HTTP Errors

What Is a 404 Not Found Error?

What Is a 404 Not Found Error? — conceptual illustration
On this page

HTTP 404 Not Found means the server understood the request but has no resource at that URL. In normal web traffic 404 is honest — the page is gone or never existed. In scraping it is more ambiguous: some anti-bot systems return 404 for blocked requests to hide the fact they are blocking you, and JS-rendered SPAs can serve a 404 shell that is actually a working page if you execute the JavaScript.

Quick facts

Status family4xx — client error
Honest meaningURL does not exist on this server
Suspicious meaningAnti-bot system returning 404 instead of 403 to obscure the block
Retry safe?Usually no — but worth trying with a different IP or fingerprint if you suspect cloaking
Detection trickCompare response from a browser vs your scraper; if browser works, it is a block

When 404 is honest

Most 404s are real: a typo in the URL, a product that has been delisted, an old article taken down, or a path that never existed. Cache the 404, mark the URL dead in your work queue, and move on. Hammering a dead URL just wastes requests and trains the target site's rate limiter on your IP.

When 404 is a block

Some anti-bot stacks deliberately return 404 to scrapers instead of 403, on the theory that "page not found" is less actionable than "you are blocked." Cloudflare, DataDome, and a handful of in-house systems do this. The tell: the page works in a real browser from your machine but consistently 404s from your scraper. The fix is the same as any block — better IP reputation, more realistic fingerprint, slower request rate.

When 404 is a rendering problem

Single-page apps often serve the same 404-shaped HTML shell for every URL, with the real content rendered client-side after a fetch call. If you scrape the raw HTML you see "404" or an empty body; if you execute the JS, the page loads normally. The signal is mismatched content-type or a near-zero response body — switch to a JS-rendering API or capture the underlying XHR endpoint directly.

Code example

python
import requests

def diagnose_404(url):
    # Real-browser UA succeeds where bare client 404s → cloaked block
    headers = {'User-Agent': 'Mozilla/5.0 (real browser UA)'}
    r1 = requests.get(url, headers=headers)
    r2 = requests.get(url)
    if r1.status_code == 200 and r2.status_code == 404:
        return 'cloaked_block'
    if r1.status_code == 404 and r2.status_code == 404:
        return 'real_404'
    return 'inconclusive'

Related terms

Concept map

How 404 Error connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · HTTP Errors
Building map…

Frequently asked questions

Should I retry 404s in a crawl?

Generally no — mark the URL dead and move on. Worth one retry through a different IP or with a real browser fingerprint if you suspect the site is cloaking blocks as 404s.

Why would a site return 404 instead of 403?

To obscure the fact that they are blocking. A 403 tells the scraper "you are detected, try harder." A 404 tells them "nothing here, give up." It is a tactic, not a bug.

How do I crawl an SPA that returns 404 for the raw HTML?

Either render JavaScript (Playwright, a JS-rendering scraping API) or reverse-engineer the XHR endpoint the SPA uses to fetch its data and call that directly — usually cheaper and faster than rendering.

Last updated: 2026-05-26