Web Scraping APIs

What Is the Chrome DevTools Protocol (CDP) in Web Scraping?

What Is the Chrome DevTools Protocol (CDP) in <a href=
On this page

The Chrome DevTools Protocol (CDP) is the low-level interface for instrumenting and controlling Chromium-based browsers. Puppeteer, Playwright, and many stealth tools sit on top of CDP. For scraping it gives you fine-grained control: intercept network requests, override headers, evaluate JavaScript in the page context, capture screenshots, dump the DOM. Direct CDP use is verbose but gives access to capabilities the higher-level libraries do not expose.

Quick facts

What it controlsAny Chromium browser (Chrome, Edge, Brave, Opera)
ConnectionWebSocket to chrome://inspect endpoint
Built on top byPuppeteer, Playwright, undetected-chromedriver, Camoufox
Direct use casesCustom interception, attach-to-existing-Chrome, browser-internal probes
DetectabilityCDP enables --remote-debugging-port; some sites detect this

Where CDP fits

Every Chromium control library ultimately speaks CDP. Puppeteer and Playwright wrap it in idiomatic APIs and add their own features (auto-waiting, selector engines). For 95% of scraping you want the wrapper, not raw CDP. The exception: when you need to attach to a real user's Chrome (a profile with cookies, history, extensions installed) instead of launching a fresh headless instance. Raw CDP via the websocket endpoint is the cleanest path.

Detection considerations

Chrome exposes the CDP port only when launched with --remote-debugging-port. Some defensive scripts probe for this and flag the session — though it is a weak signal because the port is not visible to the page itself, only to the host. The stronger CDP-related signal is the presence of the Runtime.enable domain in the page context, which Puppeteer/Playwright enable by default. Stealth tools toggle these off when not needed.

When to use CDP directly

Three real cases: (1) attaching to an existing Chrome process with a real profile, (2) implementing custom request interception that Playwright's API does not expose, (3) building a stealth tool that needs to control which CDP domains are enabled. For everything else, Playwright or Puppeteer is a better default.

Code example

python
import asyncio, json, websockets, requests

async def cdp_navigate(url):
    targets = requests.get('http://localhost:9222/json').json()
    ws_url = targets[0]['webSocketDebuggerUrl']
    async with websockets.connect(ws_url) as ws:
        await ws.send(json.dumps({
            'id': 1, 'method': 'Page.navigate', 'params': {'url': url}
        }))

asyncio.run(cdp_navigate('https://example.com'))

Related terms

Concept map

How Chrome DevTools Protocol (CDP) connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Should I use CDP directly or Playwright?

Playwright unless you have a specific reason. Direct CDP is verbose, undocumented for many edge cases, and breaks across Chrome versions. Playwright maintains compatibility for you.

Can sites detect CDP usage?

They can detect the symptoms (Runtime.enable side effects, missing chrome.runtime.runtimeId, certain navigator probes) but not the protocol itself. Hardened stealth tools mitigate most of those signals.

Does CDP work in Firefox?

Firefox implements a CDP-compatible subset for Playwright but lacks many domains. For Firefox scraping (Camoufox is Firefox-based), the Playwright API is the cleaner interface.

Last updated: 2026-05-26