Detecting when JavaScript rendering is actually needed
Before reaching for a browser, confirm you even need one - many sites that feel dynamic still ship usable data in the first response. The quickest test is to compare what your HTTP client sees against what the browser shows. Fetch the URL with requests or curl and search the raw response for a value you can see on the page (a product name, a price). If it is there, no rendering is needed and you can parse the HTML directly. If the response is a near-empty <div id="root"></div> shell with a bundle of scripts, the content is built client-side.
A second confirmation: open DevTools, disable JavaScript (Command Palette - Disable JavaScript), and reload. If the page goes blank or shows a "Please enable JavaScript" notice, the content is JS-rendered. Single-page apps built with React, Vue, Angular, or Svelte are the common case - they serve a thin shell and populate it after the bundle executes. "View Page Source" shows the original server HTML, while "Inspect" shows the live DOM after scripts run; a large gap between the two is the clearest signal that rendering happens in the browser.
When you must render: headless browsers and waiting strategies
If the endpoint is signed, short-lived, hidden behind WebSockets, or only returns pre-rendered HTML fragments, render the page in a headless browser. Playwright is a strong default - it drives Chromium, Firefox, and WebKit from one API, with mature Python, Node, and .NET bindings; Puppeteer (Chrome-focused, Node) and Selenium (the widest language and legacy-browser support) are reasonable alternatives depending on your stack.
The part people get wrong is waiting. A fixed sleep(5) is both slow and flaky. Prefer event-driven waits: page.wait_for_selector('.product-card') blocks until the specific element you need exists, while page.wait_for_load_state('networkidle') waits until background requests settle - useful for AJAX-driven pages, though it can hang on sites that poll continuously, so always pair it with a timeout. For interactive content you trigger the action (click "Load more", scroll a feed) and then wait for the new node. Rendering is resource-heavy at scale, so a managed web-data API such as Scrappey can render the page and return the final HTML or a screenshot in a single call, handling the browser, proxies, and retries for you when running your own headless fleet is more than you want to maintain.
