XPath syntax
An XPath expression reads as a path through the document tree. // means "search anywhere from here," while a single / steps to a direct child. //div selects every div; //div[@class="card"] filters by attribute; //ul/li[1] takes the first li child of a ul (XPath indexes from 1, not 0). Predicates in square brackets are the heart of XPath: [@id="main"], [contains(@class,"btn")], and [text()="Buy now"] all filter the current matches. Axes let you move in any direction - /parent::, /following-sibling::, /ancestor:: - so you can select an element and then climb to its container, which CSS cannot express. Functions like contains(), starts-with(), and normalize-space() handle messy real-world markup.
How scrapers use XPath
In Python, lxml and Scrapy expose .xpath() on a parsed document, returning matching nodes you can read text or attributes from. Selenium and Playwright accept XPath to locate elements for clicking or reading. The classic use case is data that has no clean class to grab - a price that sits in an unlabeled span next to a label, where the only reliable anchor is "the span that follows the element containing the text 'Price'." XPath expresses that directly: //*[text()="Price"]/following-sibling::span. That ability to select relative to text and to walk the tree in any direction is why XPath survives on pages where CSS selectors run out of road.
XPath vs CSS selectors
XPath and CSS selectors target the same elements but trade off differently. CSS is shorter, more readable, and faster to write for the common cases (by class, id, attribute), which is why it is the default. XPath wins when you need to select by visible text, navigate upward to a parent or ancestor, or apply a function - capabilities CSS simply lacks. The cost is verbosity and a steeper learning curve. A pragmatic workflow uses CSS for the 80% of selections that are straightforward and switches to XPath for the awkward 20% where text-matching or tree-walking is the only stable hook. As with CSS, brittle absolute paths (/html/body/div[3]/div[2]) break on any layout change - anchor on attributes or text instead. A managed parsing step can also return structured fields directly, sparing you hand-written selectors for common page types.