How to Scrape JavaScript-Rendered Pages With Python

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How to Scrape JavaScript-Rendered Pages With Python — conceptual illustration

On this page

To scrape a JavaScript-rendered page in Python you need something that executes the page’s JavaScript before you read the HTML. A plain requests.get() only returns the initial HTML the server sends, which on a modern single-page app is an almost empty shell — the real content is injected later by JavaScript running in a browser. The three reliable fixes are: drive a real browser with Playwright or Selenium, or skip the browser entirely and call the JSON API the page itself calls.

Why it happens	Content is rendered client-side; the server returns an empty HTML shell
How to detect it	View source shows no data, but the rendered page (DevTools Elements) does
Best tool (2026)	Playwright — auto-waiting, modern API, harder to detect than Selenium
Fastest method	Call the underlying JSON/XHR API directly (no browser needed)
Avoid	requests-html and Pyppeteer — both effectively unmaintained

Why Requests + BeautifulSoup returns an empty page

When you scrape a JavaScript-heavy site with the usual stack, you often get nothing back:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://quotes.toscrape.com/js/')
soup = BeautifulSoup(r.text, 'lxml')
print(soup.select('.quote'))   # => []  (empty!)

The list is empty even though the page clearly shows quotes in your browser. The reason: requests downloads only the HTML the server sends, and on a client-side-rendered page that HTML is a near-empty skeleton plus a bundle of JavaScript. The quotes only appear after that JavaScript runs in a browser and fetches the data. requests never runs JavaScript, so it never sees them.

Quick test: right-click → View Page Source (the raw HTML requests sees). If your data is missing there but present in the Elements tab of DevTools (the rendered DOM), the page is JavaScript-rendered and you need one of the methods below.

Method 1: Playwright (recommended in 2026)

Playwright drives a real Chromium/Firefox/WebKit browser, so the JavaScript runs exactly as it would for a human. It has auto-waiting built in (no manual sleep() calls) and a cleaner API than Selenium. Install it once with pip install playwright then playwright install chromium.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://quotes.toscrape.com/js/')

    # Auto-waits for the selector to appear after JS renders it.
    page.wait_for_selector('.quote')

    quotes = page.eval_on_selector_all(
        '.quote',
        'els => els.map(e => ({
            text: e.querySelector(".text").innerText,
            author: e.querySelector(".author").innerText
        }))'
    )
    for q in quotes:
        print(q['author'], '—', q['text'])

    browser.close()

Playwright also exposes the rendered HTML via page.content() if you prefer to hand it to BeautifulSoup. Use page.wait_for_selector() or page.wait_for_load_state('networkidle') instead of fixed delays so the script is both faster and more reliable.

Method 2: Selenium

Selenium is the older, most widely documented option. Since Selenium 4.6, Selenium Manager downloads the matching browser driver automatically — you no longer manage chromedriver by hand. Install with pip install selenium.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

opts = Options()
opts.add_argument('--headless=new')
driver = webdriver.Chrome(options=opts)   # driver auto-managed

try:
    driver.get('https://quotes.toscrape.com/js/')
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, '.quote'))
    )
    for el in driver.find_elements(By.CSS_SELECTOR, '.quote'):
        text = el.find_element(By.CSS_SELECTOR, '.text').text
        author = el.find_element(By.CSS_SELECTOR, '.author').text
        print(author, '—', text)
finally:
    driver.quit()

Selenium works, but it is heavier and easier for anti-bot systems to detect (it leaks navigator.webdriver and other automation signals). For new projects, Playwright is the better default; keep Selenium for code that already depends on it.

Method 3: Call the hidden JSON API directly (fastest)

Here is the trick most tutorials skip. A JavaScript page does not invent its data — it fetches it from a backend API, usually as JSON. If you call that endpoint directly, you get clean structured data with no browser at all: far faster and lighter than Playwright or Selenium.

Open DevTools → Network tab → filter by Fetch/XHR → reload the page and watch the requests. Find the one returning your data and copy its URL:

import requests

# The endpoint the page's own JavaScript calls (found in the Network tab).
api = 'https://quotes.toscrape.com/api/quotes?page=1'
data = requests.get(api).json()

for q in data['quotes']:
    print(q['author']['name'], '—', q['text'])

# Pagination is usually just a query parameter:
while data.get('has_next'):
    page = data['page'] + 1
    data = requests.get(f'https://quotes.toscrape.com/api/quotes?page={page}').json()

When it works, this is always the best option — no rendering overhead, structured JSON, trivial pagination. Watch for endpoints that require headers, a token, or a signature; copy those from the Network request too. If the API is locked behind anti-bot protection, fall through to the browser methods or a scraping API.

Which method to use — and what about blocking

Method	Speed	Runs JS	Best for
Hidden JSON API	Fastest	No (not needed)	When you can find the endpoint
Playwright	Medium	Yes	Modern SPAs, the default browser choice
Selenium	Slow	Yes	Legacy projects already on Selenium

All three break the same way: the site blocks you. Headless browsers are detectable (the Cloudflare and DataDome challenge pages render no useful HTML), and hidden APIs are often guarded by the same fingerprinting. Rendering the JavaScript is only half the battle; passing the anti-bot check is the other half.

A managed scraping API like Scrappey renders the JavaScript and handles proxies, fingerprinting, and CAPTCHAs in one call, returning the fully rendered HTML — no browser to run or detect:

Code example

python

import requests

# Render JS + pass anti-bot in one request. The API runs a real browser
# server-side and returns the fully rendered HTML.
resp = requests.post(
    'https://publisher.scrappey.com/api/v1?key=YOUR_API_KEY',
    json={
        'cmd': 'request.get',
        'url': 'https://quotes.toscrape.com/js/',
    },
    timeout=120,
)

html = resp.json()['solution']['response']

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
for q in soup.select('.quote'):
    print(q.select_one('.author').text, '-', q.select_one('.text').text)

If you want to scrape websites with Python, the first decision is which library to use. There are a handful of popular ones, and each fits a…

How to Parse HTML in Python (2026 Guide)

To parse HTML in Python you load the markup into a parser that turns it into a navigable tree, then select the elements you want with CSS se…

How to extract data from websites using Selenium Python? (2026 Guide)

How to extract data from websites using Selenium Python? (2026 Guide).…

What does BeautifulSoup do in Python? (Complete Guide 2026)

BeautifulSoup is a Python library for reading HTML. You give it the raw HTML of a web page (a long string of tags), and it turns that into a…

How to scrape dynamic JavaScript content? (2026 Guide)

Dynamic content is anything a page loads after the initial HTML arrives — usually pulled in by JavaScript running in your browser. Because t…

Best Web Scraping API for JavaScript-Rendered Sites

The best web scraping API for JavaScript-rendered sites runs a real headless browser per request, executes the page's JavaScript, waits for …

What Is Playwright?

Playwright is a cross-browser automation framework from Microsoft that drives Chromium, Firefox, and WebKit through a single API. An automat…

What Is Selenium?

Selenium is the original cross-browser automation framework — the W3C WebDriver standard predates Puppeteer by a decade. In plain terms, it …

How to Scrape Website Data to Excel

To scrape website data into Excel, fetch the page through a scraping API that returns structured JSON, load the rows into a Python list of d…

Web Scraping to Google Sheets

To get scraped data into Google Sheets you either write rows from code with the gspread library and a Google service account, or pull a publ…

How to Export Scraped Data to CSV and JSON (Python)

Export scraped data to CSV when you need flat, spreadsheet-ready rows, and to JSON when you need to preserve nested structure. In Python, th…

Concept map

How How to Scrape JavaScript-Rendered Pages With Python (2026 Guide) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Python Web Scraping

Frequently asked questions

Why does requests return an empty page for some sites?

Because those pages are rendered client-side. The server sends a near-empty HTML shell plus JavaScript, and the actual content is only added after that JavaScript runs in a browser. requests never executes JavaScript, so it only ever sees the empty shell. You need a browser engine (Playwright or Selenium) or you can call the JSON API the page fetches its data from.

Is Playwright or Selenium better for JavaScript-rendered pages?

For new projects in 2026, Playwright is the better default: it has built-in auto-waiting, a cleaner API, supports Chromium/Firefox/WebKit, and is somewhat harder to detect. Selenium is still fine if you already have a codebase built on it, and since Selenium 4.6 it auto-manages the browser driver. Avoid requests-html and Pyppeteer — both are effectively unmaintained.

How do I find the hidden API a JavaScript page uses?

Open your browser DevTools, go to the Network tab, filter by Fetch/XHR, and reload the page. Look for the request that returns your data (usually JSON). Copy its URL, method, and any required headers or tokens, then replicate it with requests. This is the fastest method because it skips browser rendering entirely and returns structured data.

My headless browser still gets blocked — what now?

Rendering JavaScript does not, on its own, satisfy anti-bot detection. Headless browsers leak automation signals (navigator.webdriver, fingerprint mismatches) that Cloudflare, DataDome, and Akamai flag, returning a challenge page with no real content. You need realistic fingerprints and residential proxies — or route the request through a scraping API that does all of that server-side and returns the rendered HTML.

Last updated: 2026-06-08