Breadth-First vs Depth-First Crawling

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

Breadth-First vs Depth-First Crawling — conceptual illustration

On this page

Breadth-first crawling (BFS) visits every link at the current depth before going any deeper; depth-first crawling (DFS) follows one chain of links as far as it goes, then backtracks. A crawler is a program that follows links from page to page to collect data, and these two strategies just decide the order it visits them. BFS surfaces the broad shape of a site quickly and is the default for general-purpose crawlers. DFS reaches deep content faster but risks getting lost in one section of a large site. Most real crawlers use BFS with a depth limit — the simplicity wins.

BFS shape	Wide and shallow first; depth grows over time
DFS shape	Narrow and deep; one section explored completely before the next
Data structure	BFS = queue; DFS = stack
Memory	BFS uses more memory for the frontier on wide sites
Most crawlers use	BFS with depth cap — the safe default

Why BFS is the default

BFS works level by level: it gives you the homepage, then every category page, then every listing page, before drilling into individual items. So for a site whose content sits three clicks deep (depth-3), BFS surfaces a meaningful map of the site within the first few hundred requests. DFS, by contrast, might burn the first thousand requests inside one tag's pagination before it ever touches another category. Most crawl goals — "get a sample of every section" — are a better fit for BFS.

When DFS makes sense

DFS wins when you have one specific deep target and broad coverage does not matter — scraping every product in a single category, say, or every page in one documentation section. It also uses less memory on very wide sites. The crawler holds a waiting list of links it has found but not yet visited; DFS keeps that list as a stack (it always takes the newest link next), so the list only grows with how deep the crawl is — which stays small. BFS keeps it as a queue (oldest link next), so the list grows with fanout × depth — every link at the current level has to be stored, which can get huge.

The hybrid that wins in practice

The strategy most production crawlers actually use is a mix: BFS with a depth limit, then targeted DFS for extraction. The first pass discovers the site's structure (BFS, depth 3). The second pass dives into the specific subtrees you identified (DFS, with no depth limit but kept within that scope). This gives you both a broad lay of the land and the deep coverage your pipeline needs.

Code example

python

from collections import deque

def bfs_crawl(seed, max_depth):
    frontier = deque([(seed, 0)])  # queue → BFS
    seen = set()
    while frontier:
        url, d = frontier.popleft()
        if url in seen or d > max_depth: continue
        seen.add(url)

def dfs_crawl(seed, max_depth):
    stack = [(seed, 0)]  # stack → DFS
    seen = set()
    while stack:
        url, d = stack.pop()
        if url in seen or d > max_depth: continue
        seen.add(url)

Related terms

What Is a Web Crawler?

A web crawler is a program that finds and downloads web pages on its own by following links - it starts from a few given pages (called seed …

What Is Crawl Budget?

Crawl budget is the upper limit on how much of a site a crawler will fetch in a single run - measured in pages, requests, or wall-clock time…

What Is Crawl Depth Limit?

Crawl depth limit is the maximum number of link hops a crawler will follow from a seed URL. A "hop" is one click along a link. The page you …

What Is Polite Crawling?

Polite crawling means running your crawler at a speed and rhythm that won't strain the websites it visits. In practice that means obeying ro…

What Is Link Extraction?

Link extraction is the crawling step where you pull every URL out of a page you have just downloaded, so you can decide which ones to visit …

What Is Throttling?

Throttling means deliberately slowing down how fast requests are sent or handled. A website throttles incoming traffic so it doesn't get ove…

Concept map

How Breadth-First vs Depth-First Crawling connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Crawling

Tools & solutions for this topic

Frequently asked questions

Which is faster?

Neither is faster by nature — total wall-clock time depends on the network, not the strategy. The two simply reach pages in a different order. That ordering matters if you plan to stop early once you have enough data, but not if you intend to crawl every page anyway.

What about priority queues?

Both BFS and DFS can be upgraded to a priority queue: instead of strict oldest-first (FIFO) or newest-first (LIFO) ordering, you visit pages in order of an importance score — sitemap priority, link count, or freshness. This is called "best-first" crawling, and it is what Google's crawler does.

Does it matter for a single-section crawl?

Less so — inside a tight scope the two strategies end up visiting roughly the same pages in roughly the same time. The choice matters most for general-purpose crawls across an unfamiliar site, where the visiting order shapes how quickly you understand its structure.

Last updated: 2026-05-31