Web Scraping APIs

What Is Selenium?

By the Scrappey Research Team

What Is Selenium? — conceptual illustration
On this page

Selenium is the original cross-browser automation framework — the W3C WebDriver standard predates Puppeteer by a decade. In plain terms, it lets your code remotely drive a real web browser. It works across Chrome, Firefox, Safari, Edge, and even mobile browsers (using Appium under the hood), all through one API, and you can write that code in Python, Java, Ruby, C#, JavaScript, Kotlin, and more. In 2026 it remains the right pick for scraping when you need Safari or mobile-browser support, or when you have existing Selenium tests to repurpose. For new Python or Node scrapers, Playwright has overtaken it.

Quick facts

StandardW3C WebDriver protocol (oldest of the browser-automation standards)
LanguagesPython, Java, C#, JavaScript, Ruby, Kotlin, and others
BrowsersChrome, Firefox, Safari, Edge, mobile via Appium
Default detectionnavigator.webdriver === true on every browser (W3C-mandated)
Stealth variantsundetected-chromedriver, SeleniumBase UC mode, selenium-stealth

Why Selenium is still relevant in 2026

Three durable reasons to pick Selenium:

  • Safari support. Playwright's WebKit isn't Safari — it's the WebKit rendering engine without Safari's actual app around it. Testing or scraping real Safari requires Selenium plus safaridriver (Apple's WebDriver helper for Safari).
  • Mobile browsers. Appium (the mobile sibling of Selenium) drives mobile Chrome, mobile Safari, and native phone apps through the same WebDriver API. No other framework reaches all of that.
  • Existing test code. Half of QA automation in the enterprise is Selenium. If your team already maintains a test suite, reusing the framework for scraping is faster than rewriting it.

For a brand-new ("greenfield") Python or Node Chromium scraper, Playwright is the better pick — a more modern API, faster startup, and better parallelism (running many browsers at once). Selenium's WebDriver wire protocol — the back-and-forth messaging used to send each command to the browser — adds about a millisecond of overhead per command, which adds up over a long script.

Selenium's detection surface

Of the three big browser-automation frameworks, Selenium is the easiest for anti-bot systems to spot, because it leaves several telltale signs (fingerprints):

  • WebDriver is W3C-standardised to set navigator.webdriver = true. Every Selenium browser exposes this flag by default, and any website can read it in one line of JavaScript. Anti-bot scripts test it as their first check.
  • Selenium injects identifying properties into window — keys like window.cdc_*, window.$cdc_*, and others (window is the global object every web page can inspect) that anti-bot scripts scan for.
  • The WebDriver wire protocol leaves timing artifacts — the delay between a command and its response differs from real human input by a measurable amount.
  • The chromedriver binary itself (the helper program that controls Chrome) has shipped with the substring "$cdc_" in its source for years — only recently patched in the mainline version.

Plain, unmodified ("vanilla") Selenium gets blocked on any modern protected site. The fixes are the stealth variants in the next section.

undetected-chromedriver and SeleniumBase UC mode

Two production-ready ways to make Selenium stealthier:

  • undetected-chromedriver (UC) — patches the chromedriver binary as it downloads to strip out the $cdc_ strings and reset navigator.webdriver. This satisfies most basic (Layer-1 and Layer-2) checks. It is still visible to Function.toString() inspection — a trick where a site reads the source code of the override function and sees it was tampered with.
  • SeleniumBase UC mode — wraps undetected-chromedriver in a pytest-friendly API (pytest is the standard Python test framework), adds automatic clicking of the Cloudflare Turnstile challenge, and gives you a clean set of sb.uc_* methods. This is the default choice when you want Selenium plus stealth plus a test framework.
  • selenium-driverless — drops the WebDriver layer entirely and drives Chrome straight through raw CDP (Chrome DevTools Protocol, the browser's native control channel). This removes the WebDriver fingerprint, but you also lose the cross-browser support that made you choose Selenium in the first place.

Even with UC, Selenium still loses to Kasada, F5 Shape, and recent Akamai. For those, switch to Camoufox, CloakBrowser, or a managed API — at that point the WebDriver protocol itself is the bottleneck.

Code example

python
# undetected-chromedriver with a residential proxy
import undetected_chromedriver as uc

options = uc.ChromeOptions()
options.add_argument(f"--proxy-server=http://residential:port")

driver = uc.Chrome(options=options, version_main=131)
try:
    driver.get("https://target.com")
    driver.implicitly_wait(5)
    print(driver.page_source[:500])
finally:
    driver.quit()
# Handles simple webdriver checks. For Cloudflare BM or Kasada, behaviour differs by framework.

Related terms

What Is Playwright?
Playwright is a cross-browser automation framework from Microsoft that drives Chromium, Firefox, and WebKit through a single API. An automat…
What Is Puppeteer?
Puppeteer is Google's Node.js library for driving a Chromium browser from code, over the Chrome DevTools Protocol (CDP) - the same channel C…
What Is Headless Browser Detection?
Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…
What Is the Chrome DevTools Protocol (CDP)?
The Chrome DevTools Protocol (CDP) is the low-level interface for instrumenting and controlling Chromium-based browsers. Low-level means it …
What Is Function.toString() Inspection?
Function.prototype.toString() inspection is a technique anti-bot scripts use to identify JavaScript functions that have been modified at run…
Web Scraping Tools 2026 — A Comparison
"Web scraping tools" is the whole family of software you use to pull data off websites — and in 2026 that family is big but neatly sorted in…
What Is Botasaurus?
Botasaurus is a free, open-source (MIT-licensed) Python framework for building web scrapers. You wrap your scraping functions with one of th…
Web Scraping With Java: A Complete 2026 Guide
Web scraping with Java means fetching a web page over HTTP and extracting structured data from its HTML, usually with Jsoup for static pages…
Web Scraping With C#: A Complete 2026 Guide
Web scraping with C# means using .NET's HttpClient to fetch a page and a parser like HtmlAgilityPack or AngleSharp to extract data from the …
What Is undetected-chromedriver?
undetected-chromedriver is an open-source Python library that provides a patched version of Selenium's ChromeDriver. It is a near drop-in re…
What Is selenium-driverless?
selenium-driverless is an open-source Python framework that drives Chrome over the Chrome DevTools Protocol (CDP) directly, without launchin…
Playwright vs Puppeteer
Playwright and Puppeteer are both Node-based browser automation libraries that drive a real browser over the Chrome DevTools Protocol (CDP),…
Playwright vs Selenium Compared
Playwright and Selenium are both browser-automation libraries that drive real browsers for testing and scraping, but they differ in architec…

Concept map

How Selenium connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Should I learn Selenium in 2026?

For brand-new scraping projects, learn Playwright instead — a newer API, fewer ways to be detected, and multi-browser support without WebDriver. Learn Selenium if you need Safari or mobile testing, or if your team already maintains Selenium tests you can extend.

What's the difference between undetected-chromedriver and SeleniumBase UC mode?

undetected-chromedriver is the underlying library that patches the chromedriver binary and runtime to remove the default automation markers. SeleniumBase UC mode wraps that library in a friendlier API, adds pytest integration, and includes built-in helpers (Cloudflare auto-click, session reuse). Want quick stealth in a pytest project? Use SeleniumBase UC. Want minimal dependencies? Use undetected-chromedriver directly.

How does Selenium with stealth variants behave against Cloudflare?

Against Bot Fight Mode and Turnstile, undetected-chromedriver plus a residential proxy (an IP address from a real home internet connection) typically completes Turnstile on most sites you are authorized to access. Against Cloudflare Bot Management Enterprise it generally does not. For those workflows teams move to Camoufox or a managed API.

Last updated: 2026-05-31