How to Scrape Infinite-Scroll Pages

Paste into ChatGPT, Claude, or any LLM

On this page

Infinite scroll is the page design where new content keeps loading on its own as you scroll down (like a social feed that never ends). To scrape one, your code has to trigger those same scroll events, wait for the new content to render, collect it, and figure out when the feed has actually stopped. The naive "scroll to the bottom once" fails, because the bottom keeps moving as more content loads. The reliable pattern is a loop: scroll, wait, collect, check whether anything new appeared, and repeat — with a cap so a truly endless feed doesn't trap you.

Required	A real browser (Playwright, Puppeteer, or rendering API)
Loop pattern	Scroll → wait for new items → collect → check delta → repeat
End-of-feed signal	Two consecutive iterations with no new items, or scroll height stable
Cap	Hard max on iterations or items to avoid runaway
Alternative	Find the XHR endpoint behind the scroll and call it directly

The XHR shortcut

Before you bother scrolling, open your browser's network tab (the DevTools panel that lists every request the page makes). Infinite scroll is almost always powered by a paginated JSON endpoint that the page calls in the background (an XHR — a JavaScript request that fetches data without reloading the page) as you scroll. That endpoint takes a cursor or page parameter and returns the next batch of items. Calling it directly is far faster than driving a browser — no rendering, no scroll loops, just JSON you page through. If the endpoint is open, or only needs a CSRF token (a small anti-forgery value) grabbed from the first page, this is the best route.

When you have to scroll

If that endpoint is signed, encrypted, or hands back ready-made HTML fragments instead of clean data, you need a real browser. The loop goes: read the current scroll height, scroll to the bottom (or down by one screen at a time), wait until either a new item appears or the network goes quiet, collect the new items, then compare against the previous round. If the item count hasn't changed for two rounds in a row, the feed is done. Cap it at a sensible maximum (for example 200 rounds) so Twitter-style feeds that never truly end don't run forever.

Pitfalls

Virtualized lists (libraries like react-window or react-virtual) drop off-screen items out of the page's HTML as you scroll — so by the time you reach the bottom, the top items are already gone. The fix is to collect after each scroll step, not just at the end. Some pages also wait until the user has paused for a moment before loading more, so add a 500ms-2s pause after each scroll. Finally, anti-bot systems flag mechanical scrolling (identical screen-sized jumps with no variation), so randomize how far you scroll and how long you pause.

Code example

python

import requests

resp = requests.post('https://publisher.scrappey.com/api/v1', json={
    'cmd': 'request.get',
    'url': 'https://feed.example.com',
    'browser_actions': [
        {'type': 'scroll_until', 'selector': '.end-of-feed', 'max_scrolls': 50}
    ]
}, headers={'Authorization': 'YOUR_API_KEY'})

html = resp.json()['solution']['response']

Related terms

How to Get All Links From a Webpage

Getting all links from a webpage means downloading the page, reading every <a href> attribute (the URL inside each link tag), turning relati…

What Is a Headless Browser?

A headless browser is a real web browser — Chrome, Firefox, or WebKit — that runs without a visible window, driven entirely by code instead …

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

How to Reverse-Engineer API Requests for Scraping

Reverse-engineering API requests for scraping means watching the network traffic a website makes, spotting the JSON endpoints that feed its …

What Is Scrapy?

Scrapy is the industry-default crawler framework for Python. It does everything around the actual HTTP request so you don't have to: it keep…

What Is Web Scraping as a Service?

Web scraping as a service (WSaaS) is a managed, cloud-based offering that handles web data extraction for you through an API or dashboard - …

Concept map

How How to Scrape Infinite-Scroll Pages connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

Is the XHR shortcut always faster?

Almost always. A signed or encrypted endpoint occasionally blocks it, but checking the network tab for that hidden JSON endpoint is the right first move on any infinite-scroll page.

How do I detect the end of the feed?

Watch for an end-of-feed marker element, or compare the item count across rounds — two rounds in a row with the same count means nothing new is loading, so you've reached the end.

What about virtualized lists?

Collect items after each scroll step, not at the end. Once an item scrolls off-screen it gets removed from the page's HTML, so you can't read it back later.

Last updated: 2026-05-31