Web Scraping APIs

What Is Puppeteer?

What Is Puppeteer? — conceptual illustration
On this page

Puppeteer is Google's Node.js library for driving a Chromium browser from code, over the Chrome DevTools Protocol (CDP) - the same channel Chrome's own DevTools use to talk to the browser. Released in 2017, it predates Playwright by three years and was the de facto standard for Chrome automation until Playwright's multi-browser support reframed the category. It is still the right pick when the project is Node-only, Chromium-only, and benefits from the larger Puppeteer-specific stealth-plugin ecosystem (puppeteer-extra-plugin-stealth, puppeteer-real-browser).

Quick facts

VendorGoogle (open-source, Apache 2.0)
LanguageNode.js / TypeScript only
BrowserChromium / Chrome (Firefox support is experimental)
ProtocolChrome DevTools Protocol (CDP)
Stealth ecosystempuppeteer-extra-plugin-stealth, puppeteer-real-browser

Puppeteer vs Playwright in practice

The two APIs are about 85% the same - both give you the same building blocks (page, frame, request, response) under similar method names. The differences that matter for scraping:

  • Auto-waiting — Playwright waits for an element to be ready to act on before clicking or typing; Puppeteer only waits when you tell it to. So Puppeteer scripts end up with more explicit waitForSelector calls.
  • Parallel contexts — A context is an isolated session (its own cookies and storage) inside one browser. Playwright's browserContext is cleaner for running several of these at once. Puppeteer supports it too, but the API is older.
  • Languages — Puppeteer is Node-only. If your stack is Python, Playwright is the only choice.
  • Stealth plugins — Puppeteer's stealth ecosystem is older and more mature. puppeteer-extra-plugin-stealth has more patches than its Playwright equivalent, though both lose to Function.toString() inspection equally.

For a brand-new scraping project in Node, default to Playwright unless your team already has Puppeteer code. Puppeteer is not deprecated, but the active feature investment has shifted to Playwright.

puppeteer-extra and the stealth plugin

The puppeteer-extra plugin system, paired with puppeteer-extra-plugin-stealth, is the most-cited anti-detection plugin ecosystem for Puppeteer. The stealth plugin runs ~17 separate patches: it hides navigator.webdriver (the flag that openly says "a script is driving this browser"), fixes the plugin array, patches WebGL parameters, normalises the User-Agent, masks the chrome.runtime object, and so on.

It addresses every detection a 2019 anti-bot system used. It does not hold up against 2024+ vendors that check via Function.toString() (see that entry) - a trick that reads back a function's source code - or that look for CDP runtime artifacts (traces left by that DevTools connection). Each of the 17 patches is a JS function whose source is visible to toString(); Kasada, recent Akamai, and PerimeterX flag this stack on first request.

For Puppeteer in production against hard targets, the modern approach is puppeteer-real-browser (driving a real Chrome rather than headless Chromium) or switching to a C++-patched variant like CloakBrowser.

When to actually use Puppeteer

Three situations where Puppeteer is the right pick over Playwright:

  • The codebase is already on Puppeteer and switching would be busywork.
  • The project depends on a Puppeteer-only library (puppeteer-cluster, puppeteer-screen-recorder) that has no Playwright equivalent.
  • The team specifically wants the older, smaller API - Puppeteer does less than Playwright, which some teams find easier to reason about.

For everything else, the answer is Playwright. The CDP protocol, the Chromium binary, and the detection surface are identical - the real difference is how the API feels to use and which languages it reaches.

Code example

javascript
// Puppeteer with stealth plugin and a residential proxy
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({
    headless: false,                  // don't advertise headless
    args: ['--proxy-server=http://residential:port'],
  });
  const page = await browser.newPage();
  await page.authenticate({ username: 'user', password: 'pass' });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
    '(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
  );
  await page.goto('https://target.com', { waitUntil: 'domcontentloaded' });
  const html = await page.content();
  console.log(html.length);
  await browser.close();
})();
// Stealth plugin handles simple checks; surfaces to Function.toString() inspection at Kasada.

Related terms

Concept map

How Puppeteer connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is Puppeteer dead?

No - it's still actively maintained by Google and ships with each Chrome major version. The momentum has shifted to Playwright (more languages, multi-browser), but Puppeteer remains a reasonable Node-only choice, and its stealth-plugin ecosystem is larger.

Can I use Puppeteer with Python?

There's pyppeteer (a community Python port), but it has been unmaintained for years. For Python, use Playwright instead.

Why does the stealth plugin not work against Kasada?

Kasada calls Function.prototype.toString() on the methods the stealth plugin patches. A real, built-in browser method returns "[native code]"; the plugin's JavaScript replacements return their own patch source code instead - a dead giveaway. The plugin patches ~17 methods, and every one fails this check. PatchRight (which patches Playwright's source rather than the runtime) is the equivalent fix on the Playwright side.

Last updated: 2026-05-31