Why BFS is the default
BFS works level by level: it gives you the homepage, then every category page, then every listing page, before drilling into individual items. So for a site whose content sits three clicks deep (depth-3), BFS surfaces a meaningful map of the site within the first few hundred requests. DFS, by contrast, might burn the first thousand requests inside one tag's pagination before it ever touches another category. Most crawl goals — "get a sample of every section" — are a better fit for BFS.
When DFS makes sense
DFS wins when you have one specific deep target and broad coverage does not matter — scraping every product in a single category, say, or every page in one documentation section. It also uses less memory on very wide sites. The crawler holds a waiting list of links it has found but not yet visited; DFS keeps that list as a stack (it always takes the newest link next), so the list only grows with how deep the crawl is — which stays small. BFS keeps it as a queue (oldest link next), so the list grows with fanout × depth — every link at the current level has to be stored, which can get huge.
The hybrid that wins in practice
The strategy most production crawlers actually use is a mix: BFS with a depth limit, then targeted DFS for extraction. The first pass discovers the site's structure (BFS, depth 3). The second pass dives into the specific subtrees you identified (DFS, with no depth limit but kept within that scope). This gives you both a broad lay of the land and the deep coverage your pipeline needs.
