Web Scraping APIs

How to Scrape Infinite-Scroll Pages

How to Scrape Infinite-Scroll Pages — conceptual illustration
On this page

Infinite scroll is the page design where new content keeps loading on its own as you scroll down (like a social feed that never ends). To scrape one, your code has to trigger those same scroll events, wait for the new content to render, collect it, and figure out when the feed has actually stopped. The naive "scroll to the bottom once" fails, because the bottom keeps moving as more content loads. The reliable pattern is a loop: scroll, wait, collect, check whether anything new appeared, and repeat — with a cap so a truly endless feed doesn't trap you.

Quick facts

RequiredA real browser (Playwright, Puppeteer, or rendering API)
Loop patternScroll → wait for new items → collect → check delta → repeat
End-of-feed signalTwo consecutive iterations with no new items, or scroll height stable
CapHard max on iterations or items to avoid runaway
AlternativeFind the XHR endpoint behind the scroll and call it directly

The XHR shortcut

Before you bother scrolling, open your browser's network tab (the DevTools panel that lists every request the page makes). Infinite scroll is almost always powered by a paginated JSON endpoint that the page calls in the background (an XHR — a JavaScript request that fetches data without reloading the page) as you scroll. That endpoint takes a cursor or page parameter and returns the next batch of items. Calling it directly is far faster than driving a browser — no rendering, no scroll loops, just JSON you page through. If the endpoint is open, or only needs a CSRF token (a small anti-forgery value) grabbed from the first page, this is the best route.

When you have to scroll

If that endpoint is signed, encrypted, or hands back ready-made HTML fragments instead of clean data, you need a real browser. The loop goes: read the current scroll height, scroll to the bottom (or down by one screen at a time), wait until either a new item appears or the network goes quiet, collect the new items, then compare against the previous round. If the item count hasn't changed for two rounds in a row, the feed is done. Cap it at a sensible maximum (for example 200 rounds) so Twitter-style feeds that never truly end don't run forever.

Pitfalls

Virtualized lists (libraries like react-window or react-virtual) drop off-screen items out of the page's HTML as you scroll — so by the time you reach the bottom, the top items are already gone. The fix is to collect after each scroll step, not just at the end. Some pages also wait until the user has paused for a moment before loading more, so add a 500ms-2s pause after each scroll. Finally, anti-bot systems flag mechanical scrolling (identical screen-sized jumps with no variation), so randomize how far you scroll and how long you pause.

Code example

python
import requests

resp = requests.post('https://publisher.scrappey.com/api/v1', json={
    'cmd': 'request.get',
    'url': 'https://feed.example.com',
    'browser_actions': [
        {'type': 'scroll_until', 'selector': '.end-of-feed', 'max_scrolls': 50}
    ]
}, headers={'Authorization': 'YOUR_API_KEY'})

html = resp.json()['solution']['response']

Related terms

Concept map

How How to Scrape Infinite-Scroll Pages connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is the XHR shortcut always faster?

Almost always. A signed or encrypted endpoint occasionally blocks it, but checking the network tab for that hidden JSON endpoint is the right first move on any infinite-scroll page.

How do I detect the end of the feed?

Watch for an end-of-feed marker element, or compare the item count across rounds — two rounds in a row with the same count means nothing new is loading, so you've reached the end.

What about virtualized lists?

Collect items after each scroll step, not at the end. Once an item scrolls off-screen it gets removed from the page's HTML, so you can't read it back later.

Last updated: 2026-05-31