Web Automation

How to scrape dynamic JavaScript content? (2026 Guide)

How to scrape dynamic JavaScript content? (2026 Guide) — conceptual illustration
On this page

Dynamic content is anything a page loads after the initial HTML arrives — usually pulled in by JavaScript running in your browser. Because the data is not in the first response, a plain HTTP fetch comes back half-empty. This guide shows how to scrape it in 2026.

Quick facts

ProblemData loads after initial HTML
Option 1Headless browser renders the JS
Option 2Call the hidden JSON/XHR API
FastestAPI endpoint if you can find it
Find itBrowser DevTools network tab

Understanding Dynamic Content

"Dynamic" means the content shows up only after JavaScript runs in the browser, not in the raw HTML the server first sends. It arrives through one of these common patterns:

1. Types of Dynamic Loading

  • AJAX requests (background calls that fetch data without reloading the page)
  • Infinite scroll (more items load as you scroll down)
  • Lazy loading (content loads only when it scrolls into view)
  • WebSocket updates (a live connection that streams new data)
  • React/Vue.js state changes (the framework re-renders the page in place)

Solution Approaches

The reliable fix is to run a real browser that executes the JavaScript, then read the page once it has rendered. Below are three approaches.

1. Using Selenium

Selenium drives a real Chrome browser. Here it scrolls to the bottom repeatedly until the page stops growing, which is how you exhaust an infinite-scroll feed before reading the items.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

class DynamicScraper:
    def __init__(self):
        self.driver = webdriver.Chrome()
        self.wait = WebDriverWait(self.driver, 10)
    
    def scrape_infinite_scroll(self, url, scroll_pause=2):
        self.driver.get(url)
        last_height = self.driver.execute_script('return document.body.scrollHeight')
        
        while True:
            # Scroll down
            self.driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
            
            # Wait for new content
            time.sleep(scroll_pause)
            
            # Calculate new scroll height
            new_height = self.driver.execute_script('return document.body.scrollHeight')
            
            # Break if no more content
            if new_height == last_height:
                break
                
            last_height = new_height
        
        # Extract content
        elements = self.driver.find_elements(By.CSS_SELECTOR, '.content-item')
        return [elem.text for elem in elements]

2. Using Playwright

Playwright is a newer, faster browser-automation tool. The key trick is waiting: networkidle means "wait until network traffic settles" and wait_for_selector means "wait until this element actually exists" — both ensure the dynamic content has arrived before you read it. A SPA (single-page app) is a site like a React/Vue app that renders everything with JavaScript.

from playwright.sync_api import sync_playwright

class ModernScraper:
    def __init__(self):
        self.playwright = sync_playwright().start()
        self.browser = self.playwright.chromium.launch()
    
    async def scrape_spa(self, url):
        page = self.browser.new_page()
        
        # Navigate and wait for network idle
        await page.goto(url, wait_until='networkidle')
        
        # Wait for specific content
        await page.wait_for_selector('.dynamic-content')
        
        # Extract data
        data = await page.evaluate('''
            () => {
                const items = document.querySelectorAll('.item');
                return Array.from(items).map(item => ({
                    title: item.querySelector('.title').innerText,
                    description: item.querySelector('.desc').innerText
                }));
            }
        ''')
        
        return data

3. Intercepting AJAX Requests

Instead of reading rendered HTML, you can capture the raw API responses the page fetches in the background. Here a proxy (mitmproxy, an HTTP proxy that sits between browser and server) watches traffic and saves any JSON coming back from an api URL — often the cleanest source of the data.

from mitmproxy import ctx

class AjaxInterceptor:
    def __init__(self):
        self.data = []
    
    def request(self, flow):
        # Add custom headers
        flow.request.headers['X-Requested-With'] = 'XMLHttpRequest'
    
    def response(self, flow):
        # Capture API responses
        if 'api' in flow.request.pretty_url:
            try:
                self.data.append(json.loads(flow.response.content))
            except json.JSONDecodeError:
                pass

# Usage with Selenium
proxy = {
    'http': 'http://localhost:8080',
    'https': 'http://localhost:8080'
}

options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=localhost:8080')
driver = webdriver.Chrome(options=options)

Best Practices

1. Handling Loading States

The most common bug is reading the page too early. Wait for the network to go quiet, wait for the loading spinner to disappear, then confirm the real content has appeared — in that order.

class LoadingHandler:
    def wait_for_load(self, page):
        # Wait for network idle
        page.wait_for_load_state('networkidle')
        
        # Check loading indicators
        try:
            page.wait_for_selector('.loading-spinner', state='hidden')
        except TimeoutError:
            pass
        
        # Ensure content is ready
        page.wait_for_selector('.content-loaded')

2. Error Recovery

Dynamic pages are flaky, so expect failures. Return None instead of crashing when an element never shows up, and retry transient errors with exponential backoff — each retry waits longer (2, 4, 8 seconds) so you do not hammer the site.

class ResilientScraper:
    def safe_extract(self, page, selector, timeout=5000):
        try:
            element = page.wait_for_selector(selector, timeout=timeout)
            return element.text_content()
        except TimeoutError:
            logger.warning(f'Element {selector} not found')
            return None
        
    async def retry_action(self, action, max_retries=3):
        for attempt in range(max_retries):
            try:
                return await action()
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                await asyncio.sleep(2 ** attempt)

Remember: Dynamic content scraping requires patience and proper waiting mechanisms. Always respect the website's resources and implement appropriate delays.

Related terms

What is Puppeteer? (Complete Guide 2026)
Puppeteer is a Node.js tool that lets your code drive a real Chrome browser automatically — clicking, typing, and reading pages just like a …
How to handle CAPTCHA in web scraping? (2026 Solutions)
A CAPTCHA is a test a website shows to tell humans apart from bots (the name stands for a "completely automated test to tell computers and h…
How Cloudflare Works (2026)
Cloudflare's Bot Management is a security layer that decides whether each visitor to a website is a human or an automated script. It sits in…
How PerimeterX (HUMAN) Works (2026)
PerimeterX, now branded as HUMAN Security, is one of the more elaborate anti-bot WAFs (Web Application Firewalls - security layers that sit …
How to Scrape Emails from Websites Legally (2026 Guide)
How to Scrape Emails from Websites Legally (2026 Guide).…
How Akamai Bot Manager Works (2026)
Akamai Bot Manager is a bot-blocking firewall — one of the oldest and most widely deployed on the internet. It runs on Akamai's CDN (content…
Residential vs Datacenter Proxies: Which to Choose? (2026 Guide)
A proxy is a middleman server that fetches web pages on your behalf, so the target site sees the proxy's IP address instead of yours. The tw…
How to Scrape JavaScript-Rendered Pages With Python (2026 Guide)
To scrape a JavaScript-rendered page in Python you need something that executes the page’s JavaScript before you read the HTML. A plain requ…
Web Scraping to Google Sheets
To get scraped data into Google Sheets you either write rows from code with the gspread library and a Google service account, or pull a publ…
How to Export Scraped Data to CSV and JSON (Python)
Export scraped data to CSV when you need flat, spreadsheet-ready rows, and to JSON when you need to preserve nested structure. In Python, th…

Concept map

How How to scrape dynamic JavaScript content? (2026 Guide) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Automation
Building map…

Frequently asked questions

How do I know if content is dynamic?

View the page source (Ctrl+U) or fetch it with curl — both show the raw HTML before any JavaScript runs. If the data is missing there but appears in the rendered page you see in the browser, it is injected by JavaScript after load, which means it is dynamic.

Is rendering with a browser always required?

No. Often the page fetches its data from a JSON API you can call directly — far faster and lighter than spinning up a browser. Open your browser's network tab first and look for the request that returns the data; if you find one, call it yourself and skip the browser entirely.

Why is my headless browser slow?

Rendering full pages is resource-heavy — every image, font, and script costs time and memory. Block the images and fonts you do not need, reuse browser contexts instead of launching a fresh browser each time, and prefer the underlying API when one exists.

Last updated: 2026-05-31