What Is Puppeteer?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is Puppeteer? — conceptual illustration

On this page

Puppeteer is Google's Node.js library for driving a Chromium browser from code, over the Chrome DevTools Protocol (CDP) - the same channel Chrome's own DevTools use to talk to the browser. Released in 2017, it predates Playwright by three years and was the de facto standard for Chrome automation until Playwright's multi-browser support reframed the category. It is still the right pick when the project is Node-only, Chromium-only, and benefits from the larger Puppeteer-specific stealth-plugin ecosystem (puppeteer-extra-plugin-stealth, puppeteer-real-browser).

Vendor	Google (open-source, Apache 2.0)
Language	Node.js / TypeScript only
Browser	Chromium / Chrome (Firefox support is experimental)
Protocol	Chrome DevTools Protocol (CDP)
Stealth ecosystem	puppeteer-extra-plugin-stealth, puppeteer-real-browser

Puppeteer vs Playwright in practice

The two APIs are about 85% the same - both give you the same building blocks (page, frame, request, response) under similar method names. The differences that matter for scraping:

Auto-waiting — Playwright waits for an element to be ready to act on before clicking or typing; Puppeteer only waits when you tell it to. So Puppeteer scripts end up with more explicit waitForSelector calls.
Parallel contexts — A context is an isolated session (its own cookies and storage) inside one browser. Playwright's browserContext is cleaner for running several of these at once. Puppeteer supports it too, but the API is older.
Languages — Puppeteer is Node-only. If your stack is Python, Playwright is the only choice.
Stealth plugins — Puppeteer's stealth ecosystem is older and more mature. puppeteer-extra-plugin-stealth has more patches than its Playwright equivalent, though both lose to Function.toString() inspection equally.

For a brand-new scraping project in Node, default to Playwright unless your team already has Puppeteer code. Puppeteer is not deprecated, but the active feature investment has shifted to Playwright.

puppeteer-extra and the stealth plugin

The puppeteer-extra plugin system, paired with puppeteer-extra-plugin-stealth, is the most-cited anti-detection plugin ecosystem for Puppeteer. The stealth plugin runs ~17 separate patches: it hides navigator.webdriver (the flag that openly says "a script is driving this browser"), fixes the plugin array, patches WebGL parameters, normalises the User-Agent, masks the chrome.runtime object, and so on.

It addresses every detection a 2019 anti-bot system used. It does not hold up against 2024+ vendors that check via Function.toString() (see that entry) - a trick that reads back a function's source code - or that look for CDP runtime artifacts (traces left by that DevTools connection). Each of the 17 patches is a JS function whose source is visible to toString(); Kasada, recent Akamai, and PerimeterX flag this stack on first request.

For Puppeteer in production against hard targets, the modern approach is puppeteer-real-browser (driving a real Chrome rather than headless Chromium) or switching to a C++-patched variant like CloakBrowser.

When to actually use Puppeteer

Three situations where Puppeteer is the right pick over Playwright:

The codebase is already on Puppeteer and switching would be busywork.
The project depends on a Puppeteer-only library (puppeteer-cluster, puppeteer-screen-recorder) that has no Playwright equivalent.
The team specifically wants the older, smaller API - Puppeteer does less than Playwright, which some teams find easier to reason about.

For everything else, the answer is Playwright. The CDP protocol, the Chromium binary, and the detection surface are identical - the real difference is how the API feels to use and which languages it reaches.

Code example

javascript

// Puppeteer with stealth plugin and a residential proxy
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());

(async () => {
  const browser = await puppeteer.launch({
    headless: false,                  // don't advertise headless
    args: ['--proxy-server=http://residential:port'],
  });
  const page = await browser.newPage();
  await page.authenticate({ username: 'user', password: 'pass' });
  await page.setUserAgent(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
    '(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
  );
  await page.goto('https://target.com', { waitUntil: 'domcontentloaded' });
  const html = await page.content();
  console.log(html.length);
  await browser.close();
})();
// Stealth plugin handles simple checks; surfaces to Function.toString() inspection at Kasada.

Related terms

What Is Playwright?

Playwright is a cross-browser automation framework from Microsoft that drives Chromium, Firefox, and WebKit through a single API. An automat…

What Is the Chrome DevTools Protocol (CDP)?

The Chrome DevTools Protocol (CDP) is the low-level interface for instrumenting and controlling Chromium-based browsers. Low-level means it …

What Is Headless Browser Detection?

Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…

What Is Function.toString() Inspection?

Function.prototype.toString() inspection is a technique anti-bot scripts use to identify JavaScript functions that have been modified at run…

What Is CloakBrowser?

CloakBrowser is a Chromium build with 49 C++ binary patches that give it a consistent browser configuration. The goal is for it to present l…

Web Scraping Tools 2026 — A Comparison

"Web scraping tools" is the whole family of software you use to pull data off websites — and in 2026 that family is big but neatly sorted in…

What Is Botasaurus?

Botasaurus is a free, open-source (MIT-licensed) Python framework for building web scrapers. You wrap your scraping functions with one of th…

What Is JavaScript Rendering?

JavaScript rendering is the process of executing a page's JavaScript in a real browser engine so that content built on the client side appea…

What Is puppeteer-extra-plugin-stealth?

puppeteer-extra-plugin-stealth is an open-source plugin for the puppeteer-extra wrapper that bundles a collection of independent "evasion mo…

What Is rebrowser-patches?

rebrowser-patches is an open-source set of drop-in patches for Puppeteer and Playwright that changes how those libraries set up their CDP ex…

Playwright vs Puppeteer

Playwright and Puppeteer are both Node-based browser automation libraries that drive a real browser over the Chrome DevTools Protocol (CDP),…

Playwright vs Selenium Compared

Playwright and Selenium are both browser-automation libraries that drive real browsers for testing and scraping, but they differ in architec…

Concept map

How Puppeteer connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

Is Puppeteer dead?

No - it's still actively maintained by Google and ships with each Chrome major version. The momentum has shifted to Playwright (more languages, multi-browser), but Puppeteer remains a reasonable Node-only choice, and its stealth-plugin ecosystem is larger.

Can I use Puppeteer with Python?

There's pyppeteer (a community Python port), but it has been unmaintained for years. For Python, use Playwright instead.

Why does the stealth plugin not work against Kasada?

Kasada calls Function.prototype.toString() on the methods the stealth plugin patches. A real, built-in browser method returns "[native code]"; the plugin's JavaScript replacements return their own patch source code instead - a dead giveaway. The plugin patches ~17 methods, and every one fails this check. PatchRight (which patches Playwright's source rather than the runtime) is the equivalent fix on the Playwright side.

Last updated: 2026-05-31