Web Scraping APIs

What Is Puppeteer?

What Is Puppeteer? — conceptual illustration
On this page

Puppeteer is Google's Node.js library for controlling Chromium via the Chrome DevTools Protocol (CDP). Released in 2017, it predates Playwright by three years and was the de facto standard for Chrome automation until Playwright's multi-browser support reframed the category. It remains the right pick when the project is Node-only, Chromium-only, and benefits from the larger Puppeteer-specific stealth-plugin ecosystem (puppeteer-extra-plugin-stealth, puppeteer-real-browser).

Quick facts

VendorGoogle (open-source, Apache 2.0)
LanguageNode.js / TypeScript only
BrowserChromium / Chrome (Firefox support is experimental)
ProtocolChrome DevTools Protocol (CDP)
Stealth ecosystempuppeteer-extra-plugin-stealth, puppeteer-real-browser

Puppeteer vs Playwright in practice

The APIs are 85% the same — both expose page, frame, request, and response abstractions with similar method names. Differences that matter for scraping:

  • Auto-waiting — Playwright waits for elements to be actionable by default; Puppeteer waits only when explicitly told. Puppeteer scripts have more waitForSelector calls.
  • Parallel contexts — Playwright's browserContext abstraction is cleaner for running multiple isolated sessions in one browser. Puppeteer supports it but the API is older.
  • Languages — Puppeteer is Node-only. If your stack is Python, Playwright is the only choice.
  • Stealth plugins — Puppeteer's stealth ecosystem is older and more mature. puppeteer-extra-plugin-stealth has more patches than its Playwright equivalent, though both lose to Function.toString() inspection equally.

For a greenfield scraping project in Node, default to Playwright unless your team has existing Puppeteer code. Puppeteer is not deprecated, but the active feature investment has shifted to Playwright.

puppeteer-extra and the stealth plugin

The puppeteer-extra plugin system, with puppeteer-extra-plugin-stealth, is the most-cited stealth approach for Puppeteer. It runs ~17 individual patches: hides navigator.webdriver, fixes the plugin array, patches WebGL parameters, normalises the User-Agent, masks the chrome.runtime object, and so on.

It defeats every detection a 2019 anti-bot system used. It does not defeat 2024+ vendors that check via Function.toString() (see that entry) or that look for CDP runtime artifacts. Each of the 17 patches is a JS function whose source is visible to toString(); Kasada, recent Akamai, and PerimeterX flag this stack on first request.

For Puppeteer in production against hard targets, the modern approach is puppeteer-real-browser (driving a real Chrome rather than headless Chromium) or switching to a C++-patched variant like CloakBrowser.

When to actually use Puppeteer

Three scenarios where Puppeteer is the right pick over Playwright:

  • The codebase is already on Puppeteer and switching is gratuitous.
  • The project depends on a Puppeteer-only library (puppeteer-cluster, puppeteer-screen-recorder) that doesn't have a Playwright equivalent.
  • The team specifically wants the older, smaller API surface — Puppeteer's scope is narrower than Playwright's, which some teams find easier to reason about.

For everything else, the answer is Playwright. The CDP protocol, the Chromium binary, and the detection surface are identical — the practical difference is API ergonomics and the language reach.

Code example

javascript
// Puppeteer with stealth plugin and a residential proxy
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({
    headless: false,                  // don't advertise headless
    args: ['--proxy-server=http://residential:port'],
  });
  const page = await browser.newPage();
  await page.authenticate({ username: 'user', password: 'pass' });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
    '(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
  );
  await page.goto('https://target.com', { waitUntil: 'domcontentloaded' });
  const html = await page.content();
  console.log(html.length);
  await browser.close();
})();
// Stealth plugin defeats simple checks; loses to Function.toString() inspection at Kasada.

Related terms

Concept map

How Puppeteer connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is Puppeteer dead?

No — it's still actively maintained by Google and shipped with each Chrome major version. The momentum has shifted to Playwright (more languages, multi-browser) but Puppeteer is a reasonable Node-only choice and the stealth-plugin ecosystem is larger.

Can I use Puppeteer with Python?

There's pyppeteer (community Python port) but it's been unmaintained for years. For Python, use Playwright instead.

Why does the stealth plugin not work against Kasada?

Kasada calls Function.prototype.toString() on the methods the stealth plugin patches. Real native methods return "[native code]"; the plugin's JS replacements return the patch source code. The plugin patches ~17 methods, every one of which fails this check. PatchRight (patches Playwright source rather than runtime) is the equivalent fix on the Playwright side.

Last updated: 2026-05-27