XPath Tester — Test XPath Expressions Online

What is XPath and why use it for web scraping

XPath is a query language for navigating XML and HTML documents by structure, not just by tag name. Where a CSS selector says "give me every .price", an XPath can say "give me every span whose class contains price, that lives inside a div with data-product-id, but only the second one." That precision matters when you're scraping pages where class names are obfuscated or repeat across unrelated sections.

Scrapers reach for XPath when they need axis navigation (parent::, following-sibling::), text matching (contains(text(), 'Sold out')), or attribute predicates that CSS can't express. Browsers, lxml, and parsel all ship XPath 1.0, so the same expression works in your browser dev tools and in your Python scraper.

XPath cheat sheet for scrapers

Pattern	What it matches	Example
//tag	All descendants with this tag	//a → every link
/html/body/div	Absolute path from the root	/html/body//h1
*	Any element	//div/*
@attr	Attribute value	//img/@src
text()	Text node child	//span/text()
[@attr="val"]	Attribute equals predicate	//a[@rel="nofollow"]
contains()	Substring match	//div[contains(@class,'price')]
starts-with()	Prefix match	//a[starts-with(@href,'/p/')]
[n]	nth child of its kind	//ul/li[1]
(...)[n]	nth across the whole set	(//img)[1]
last()	Last in a node set	//li[last()]
..	Parent	//span[@class='price']/..
following-sibling::	Sibling after this node	//h2/following-sibling::p[1]
ancestor::	Any ancestor	//span[@class='price']/ancestor::article
not()	Negation	//a[not(@rel)]

XPath vs CSS selectors — which to use

CSS selectors are shorter, faster to read, and supported by every browser API (querySelectorAll) plus every scraping library. Use them as your default.

Reach for XPath when you need: text content matching (CSS can't query text), axis traversal (parents, ancestors, following-sibling), positional predicates across the whole result set ((//div)[3] vs CSS's :nth-of-type which is per-parent), or attribute substring matching with conditions that compose more cleanly than CSS attribute selectors. Try the same query both ways in the CSS Selector Tester — for a given page you'll quickly feel which one fits.

Using XPath in Python (lxml + BeautifulSoup)

lxml has native XPath support. BeautifulSoup doesn't, but it can hand off to lxml or parsel.

# pip install lxml requests beautifulsoup4
import requests
from lxml import html

resp = requests.get("https://example.com")
tree = html.fromstring(resp.content)

# Get every product title
titles = tree.xpath("//a[@class='product-title']/text()")

# Or, BeautifulSoup users can hand off to lxml:
from bs4 import BeautifulSoup
soup = BeautifulSoup(resp.content, "lxml")
# BeautifulSoup itself has no XPath; for that you can either
# use SoupSieve (CSS only) or pass soup.encode() back into lxml.

Common XPath errors and how to fix them

Missing default namespace

Parsing XHTML/SVG/XML often sets a default namespace, and XPath 1.0 has no concept of a default namespace — unprefixed names won't match. Either parse as HTML (which ignores the namespace) or register a prefix and write //x:div instead of //div.

Attribute matching is case-sensitive

XPath 1.0 compares strings case-sensitively. //input[@type="Text"] won't match <input type="text">. Use translate(@type, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')="text" for case-insensitive matching, or normalize the input upstream.

text() vs string() vs .

text() returns text NODES, not a string — predicates like [text()='Sold out'] only match if the FIRST text node equals that. Use [normalize-space(.)='Sold out'] to compare the full concatenated text of the element.

contains() vs equality

[@class='price'] only matches if the WHOLE class attribute is exactly 'price'. Real HTML usually has multiple classes ('price price--current'), so use [contains(concat(' ',normalize-space(@class),' '),' price ')] for an exact word match, or simpler contains(@class,'price') if substring overlap is fine.

FAQ

What's the difference between XPath 1.0 and 2.0?

Browsers and most scraping libraries (lxml, parsel) implement XPath 1.0. XPath 2.0+ adds proper data types, regex (matches()), date math, and for-loops, but is mainly available in Saxon-based XML tooling. If you're scraping web pages, assume 1.0 — the rest of this guide does.

Can I use XPath in JavaScript?

Yes. The browser exposes document.evaluate(expr, contextNode, resolver, resultType, result). It returns an XPathResult you iterate. This tool uses exactly that, which is why expressions behave identically here and in your browser DevTools console ($x("//h1") in Chrome).

Why does my XPath work in browser console but not in Python?

Three usual culprits: (1) the browser parses partially broken HTML leniently while lxml may be stricter — try html.fromstring instead of etree.fromstring; (2) the page is rendered by JavaScript and your Python sees the empty server HTML — fetch the rendered HTML via a tool like Scrappey; (3) the browser console's $x is XPath 1.0, but a few helpers (like $) accept CSS only.

How do I select an element by its text content?

Use //tag[normalize-space(.)='Sold out'] for an exact match, or //tag[contains(., 'Sold out')] for a substring match. normalize-space() collapses whitespace, which is usually what you want with HTML.

Is XPath faster than CSS selectors?

In browsers, CSS via querySelectorAll is generally faster than document.evaluate, especially for simple selectors. In lxml the gap is much smaller and XPath is often faster for deeply nested queries. The bottleneck is almost never the selector engine — it's the network and parsing.

XPath tester — test XPath expressions against any HTML