Crawling

What Is Crawl Depth Limit?

What Is Crawl Depth Limit? — conceptual illustration
On this page

Crawl depth limit is the maximum number of link hops a crawler will follow from a seed URL. Depth 0 fetches only the seed; depth 1 fetches the seed plus everything linked from it; depth 2 follows links from those pages, and so on. Combined with a budget, depth shapes which parts of a site get reached. Most content lives within 2-4 hops of the homepage; beyond that you mostly find pagination, filters, and tag pages.

Quick facts

Depth 0Seed page only
Depth 1Seed + direct links — usually category pages
Depth 2-3Most content on well-structured sites
Depth 4+Diminishing returns; mostly filters and tags
Combined withCrawl budget (pages cap) and scope (domain/path filter)

Where content lives

For most sites, the homepage links to category pages (depth 1), category pages link to listings (depth 2), and listings link to detail pages (depth 3). Going deeper rarely reveals new content — you hit pagination, sort variants, and tag clouds. Sizing depth at 3-4 captures the bulk of meaningful URLs without blowing budget on combinatorial junk.

Depth vs budget interaction

Depth and budget interact: a high depth limit on a small budget cuts off mid-traversal; a low depth limit on a large budget wastes capacity. The rule of thumb: set depth to the natural shape of the site (3-4 hops for most), then size budget to "depth × average fanout" with safety margin. A site with 20 categories and 200 items each fits in ~5,000 pages at depth 3.

Per-pattern depth

Advanced crawlers vary depth by URL pattern. Detail pages get depth 0 (do not follow links). Category pages get high depth (you want to discover the items). Pagination links get capped at 50-100 to avoid infinite-calendar traps. This is more work than a single global depth limit but dramatically improves budget efficiency on large sites.

Code example

python
from collections import deque

def crawl_with_depth(seed, max_depth=3):
    frontier = deque([(seed, 0)])
    seen = set()
    while frontier:
        url, depth = frontier.popleft()
        if url in seen or depth > max_depth: continue
        seen.add(url)
        for link in extract_links(url):
            if link not in seen:
                frontier.append((link, depth + 1))

Related terms

Concept map

How Crawl Depth Limit connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Crawling
Building map…

Tools & solutions for this topic

Frequently asked questions

What depth should I start with?

3 for content sites, 2 for e-commerce listings (homepage → category → item is exactly 2). Adjust after seeing what you reached in the first run.

Does depth limit help with infinite-link traps?

Partially. A depth limit caps the worst case, but a single bad pattern (calendar URLs that link to next-month forever) can still consume budget at depth 1. Combine depth limits with URL pattern exclusions for real protection.

What is the difference between depth and crawl budget?

Depth limits how far you walk from each seed; budget limits the total amount of crawling. You need both.

Last updated: 2026-05-26