Where content lives
Most sites follow a simple layering. The homepage links to category pages (depth 1), category pages link to listings (depth 2), and listings link to the actual detail pages you want (depth 3). Going deeper rarely reveals anything new — you start hitting pagination, sort variants, and tag clouds instead. Setting depth at 3-4 captures the bulk of meaningful URLs without spending your budget on this combinatorial junk (the explosion of near-duplicate URLs created by filters and sort options).
Depth vs budget interaction
Depth and budget pull on each other. A high depth limit with a small budget runs out partway through and stops mid-traversal; a low depth limit with a large budget leaves capacity unused. The rule of thumb: set depth to match the natural shape of the site (3-4 hops for most), then size the budget to roughly "depth × average fanout" — fanout being how many links a typical page has — plus a safety margin. For example, a site with 20 categories and 200 items each fits in about 5,000 pages at depth 3.
Per-pattern depth
Advanced crawlers set depth differently depending on the type of URL, rather than using one number everywhere. Detail pages get depth 0 (fetch them, but do not follow their links). Category pages get a high depth, since that is where you discover items. Pagination links (the "next page" links) get capped at 50-100 to avoid infinite-calendar traps — pages like a calendar's "next month" link that go on forever. This takes more setup than a single global limit, but it dramatically improves how efficiently you spend your budget on large sites.
