What Is an XPath Selector?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is an XPath Selector? — conceptual illustration

On this page

XPath (XML Path Language) is a query language for navigating the tree structure of an HTML or XML document to select elements by their path, attributes, or text content. Where a CSS selector matches patterns, XPath describes a route through the document: //div[@class="price"] selects every div with class "price" anywhere in the tree, and //a[contains(text(),"Next")] selects links whose visible text contains "Next" - something plain CSS cannot do. It is the more powerful of the two main element-selection languages used in scraping.

Stands for	XML Path Language
Selects by	Path, attribute, text content, and position
Starts with	// (anywhere) or / (absolute from root)
Unique powers	Match by text, walk to parent/ancestor, use functions
Used in	lxml, Scrapy, Selenium, Playwright, Puppeteer

XPath syntax

An XPath expression reads as a path through the document tree. // means "search anywhere from here," while a single / steps to a direct child. //div selects every div; //div[@class="card"] filters by attribute; //ul/li[1] takes the first li child of a ul (XPath indexes from 1, not 0). Predicates in square brackets are the heart of XPath: [@id="main"], [contains(@class,"btn")], and [text()="Buy now"] all filter the current matches. Axes let you move in any direction - /parent::, /following-sibling::, /ancestor:: - so you can select an element and then climb to its container, which CSS cannot express. Functions like contains(), starts-with(), and normalize-space() handle messy real-world markup.

How scrapers use XPath

In Python, lxml and Scrapy expose .xpath() on a parsed document, returning matching nodes you can read text or attributes from. Selenium and Playwright accept XPath to locate elements for clicking or reading. The classic use case is data that has no clean class to grab - a price that sits in an unlabeled span next to a label, where the only reliable anchor is "the span that follows the element containing the text 'Price'." XPath expresses that directly: //*[text()="Price"]/following-sibling::span. That ability to select relative to text and to walk the tree in any direction is why XPath survives on pages where CSS selectors run out of road.

XPath vs CSS selectors

XPath and CSS selectors target the same elements but trade off differently. CSS is shorter, more readable, and faster to write for the common cases (by class, id, attribute), which is why it is the default. XPath wins when you need to select by visible text, navigate upward to a parent or ancestor, or apply a function - capabilities CSS simply lacks. The cost is verbosity and a steeper learning curve. A pragmatic workflow uses CSS for the 80% of selections that are straightforward and switches to XPath for the awkward 20% where text-matching or tree-walking is the only stable hook. As with CSS, brittle absolute paths (/html/body/div[3]/div[2]) break on any layout change - anchor on attributes or text instead. A managed parsing step can also return structured fields directly, sparing you hand-written selectors for common page types.

Code example

python

from lxml import html

doc = html.fromstring(
    '<div class="product"><span>Price</span>'
    '<span class="value">$19.99</span></div>'
)

# Select by attribute
price = doc.xpath('//span[@class="value"]/text()')[0]   # '$19.99'

# Select relative to text - XPath can do this, CSS cannot
next_to_label = doc.xpath('//span[text()="Price"]/following-sibling::span/text()')[0]

print(price, next_to_label)

Related terms

What Is a CSS Selector?

A CSS selector is a pattern that picks out specific elements in an HTML document by matching their tag, class, id, attributes, or position. …

What Is Data Parsing?

Data parsing is the process of taking raw, messy data and turning it into a clean, structured format your program can use. In web scraping, …

How to Get All Links From a Webpage

Getting all links from a webpage means downloading the page, reading every <a href> attribute (the URL inside each link tag), turning relati…

What Is Web Scraping?

Web scraping is the automated extraction of structured data from websites. Instead of a person copying and pasting, a program (a "scraper") …

What Is Scrapy?

Scrapy is the industry-default crawler framework for Python. It does everything around the actual HTTP request so you don't have to: it keep…

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

Concept map

How XPath Selector connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

What is XPath used for in web scraping?

XPath locates elements in an HTML document so a scraper can extract their text or attributes. It is especially useful when there is no clean class or id to target - for example selecting an element by its visible text, or selecting a value relative to a nearby label, which CSS selectors cannot express.

What is the difference between // and / in XPath?

A double slash (//) searches anywhere in the document from the current node, so //div finds every div at any depth. A single slash (/) selects a direct child one level down, so /html/body/div matches only a div that is an immediate child of body. Use // for flexible matching, / for precise paths.

Is XPath better than CSS selectors?

Neither is universally better. XPath is more powerful - it can match by text, walk to a parent, and use functions - while CSS selectors are shorter and easier to read for common selections. Most scrapers use CSS by default and reach for XPath only when they need its extra capabilities.

Why does my XPath break when the site changes?

Usually because it relies on an absolute positional path like /html/body/div[3]/div[2], which depends on the exact layout. Any markup change shifts the positions and the path no longer matches. Anchor instead on stable attributes or text content, e.g. //div[@class='price'], so the expression survives layout edits.

Last updated: 2026-06-08