Why Requests + BeautifulSoup returns an empty page
When you scrape a JavaScript-heavy site with the usual stack, you often get nothing back:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://quotes.toscrape.com/js/')
soup = BeautifulSoup(r.text, 'lxml')
print(soup.select('.quote')) # => [] (empty!)
The list is empty even though the page clearly shows quotes in your browser. The reason: requests downloads only the HTML the server sends, and on a client-side-rendered page that HTML is a near-empty skeleton plus a bundle of JavaScript. The quotes only appear after that JavaScript runs in a browser and fetches the data. requests never runs JavaScript, so it never sees them.
Quick test: right-click → View Page Source (the raw HTML requests sees). If your data is missing there but present in the Elements tab of DevTools (the rendered DOM), the page is JavaScript-rendered and you need one of the methods below.
Method 1: Playwright (recommended in 2026)
Playwright drives a real Chromium/Firefox/WebKit browser, so the JavaScript runs exactly as it would for a human. It has auto-waiting built in (no manual sleep() calls) and a cleaner API than Selenium. Install it once with pip install playwright then playwright install chromium.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto('https://quotes.toscrape.com/js/')
# Auto-waits for the selector to appear after JS renders it.
page.wait_for_selector('.quote')
quotes = page.eval_on_selector_all(
'.quote',
'els => els.map(e => ({
text: e.querySelector(".text").innerText,
author: e.querySelector(".author").innerText
}))'
)
for q in quotes:
print(q['author'], '—', q['text'])
browser.close()
Playwright also exposes the rendered HTML via page.content() if you prefer to hand it to BeautifulSoup. Use page.wait_for_selector() or page.wait_for_load_state('networkidle') instead of fixed delays so the script is both faster and more reliable.
Method 2: Selenium
Selenium is the older, most widely documented option. Since Selenium 4.6, Selenium Manager downloads the matching browser driver automatically — you no longer manage chromedriver by hand. Install with pip install selenium.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
opts = Options()
opts.add_argument('--headless=new')
driver = webdriver.Chrome(options=opts) # driver auto-managed
try:
driver.get('https://quotes.toscrape.com/js/')
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.quote'))
)
for el in driver.find_elements(By.CSS_SELECTOR, '.quote'):
text = el.find_element(By.CSS_SELECTOR, '.text').text
author = el.find_element(By.CSS_SELECTOR, '.author').text
print(author, '—', text)
finally:
driver.quit()
Selenium works, but it is heavier and easier for anti-bot systems to detect (it leaks navigator.webdriver and other automation signals). For new projects, Playwright is the better default; keep Selenium for code that already depends on it.
Which method to use — and what about blocking
| Method | Speed | Runs JS | Best for |
|---|---|---|---|
| Hidden JSON API | Fastest | No (not needed) | When you can find the endpoint |
| Playwright | Medium | Yes | Modern SPAs, the default browser choice |
| Selenium | Slow | Yes | Legacy projects already on Selenium |
All three break the same way: the site blocks you. Headless browsers are detectable (the Cloudflare and DataDome challenge pages render no useful HTML), and hidden APIs are often guarded by the same fingerprinting. Rendering the JavaScript is only half the battle; passing the anti-bot check is the other half.
A managed scraping API like Scrappey renders the JavaScript and handles proxies, fingerprinting, and CAPTCHAs in one call, returning the fully rendered HTML — no browser to run or detect: