Why budgets exist
Real sites have a near-endless supply of URLs: pagination, sorting, filtering, search results, calendar pages. A naive crawl follows every combination and hits millions of low-value pages before it ever reaches the content you came for. A budget forces you to set priorities: which URL patterns are worth crawling, in what order, and where to stop.
Spending the budget well
The standard playbook: grab the sitemap first to get the canonical list of content URLs, then crawl section by section in priority order. Limit depth to 3-5 hops (link clicks) from each seed URL, and skip patterns that explode into endless combinations - faceted filters, sort variants, and session IDs (per-visit identifiers stuck in the URL). When you hit the budget, log what you reached and what you missed, so the next run can pick up where this one stopped.
SEO crawl budget
In SEO, "crawl budget" means how often Googlebot will fetch your site - a limit Google sets based on your site speed, how fresh your content is, and your domain authority. You spend it wisely by exposing fast, canonical (single official version) URLs and not wasting it on duplicate content. The principle matches a custom crawler exactly: spend the budget on URLs that matter, and prevent waste on URLs that do not.
