When to batch
Batch is the right pattern when (a) you have a large URL list, (b) you do not need results in real time, and (c) you want the API to handle retries and concurrency for you. Building a custom batch processor in-house is months of work to do well — concurrency control, retry logic, dead-letter handling, idempotency, progress reporting. A managed batch endpoint amortizes that across all customers.
How to size batches
Most batch APIs accept up to ~1M URLs per job, but the right size is smaller: 1,000-10,000 URLs per batch. Smaller batches give you faster feedback (failed configurations surface in minutes, not hours), parallel batches across job IDs let you balance load, and recovery from a single bad batch does not require re-running everything. Split a 1M-URL crawl into 100-200 batches.
Synchronous fallback
Sometimes you have a batch-sized job but need a few results in real time — e.g., a content monitoring pipeline that gets 99% of its data from a nightly batch but needs to check a breaking-news URL immediately. Most scraping APIs offer both batch and per-request endpoints; route the real-time work to the sync endpoint and everything else to batch. Do not run a sync request in a tight loop expecting batch performance — you will get rate-limited.
