SERP scraping in 2026
A Google search results page (SERP) is no longer just ten blue links. In 2026 it mixes organic results, AI overviews, knowledge panels, product carousels, and ad blocks — all drawn by JavaScript and personalized to the user. A good SEO scraping API untangles this into clean, structured output: ranked organic positions, featured snippets, AI overview text, paid placements, and competitor citations. Country and device targeting are not optional — the same query returns a very different SERP on desktop in the US versus mobile in Germany, so you must tell the API where and on what device to search.
On-page extraction
Once you know which URLs rank, the on-page pass visits each one and pulls out the SEO signals: title, meta description, canonical (which page is the "official" version), robots directive (whether search engines may index the page), hreflang alternates (language/region versions), all h1-h6 headings in the order they appear, structured data (JSON-LD, microdata — machine-readable tags that describe the page's content), Open Graph and Twitter cards, image alt counts, internal vs external link counts, and word count. For technical SEO, also grab render-blocking JS, the CSS file count, and the diff between the rendered DOM (the page after JavaScript runs) and the raw source HTML.
Core Web Vitals require real browsers
Core Web Vitals are Google's scores for how fast and stable a page feels: LCP (how long the main content takes to appear), INP (how quickly the page responds to taps and clicks), and CLS (how much the layout jumps around as it loads). You cannot measure these from a plain HTTP fetch — they only emerge when a real browser actually renders and runs the page. So you need a real browser on a network profile that matches Google's field data, usually a simulated slow 4G connection. Most scraping APIs offer this as a premium feature, so budget for it on the pages that matter (homepage, top landing pages) rather than crawling the whole site this way.
