Web Scraping With C#

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

Web Scraping With C# — conceptual illustration

On this page

Web scraping with C# means using .NET's HttpClient to fetch a page and a parser like HtmlAgilityPack or AngleSharp to extract data from the HTML. HtmlAgilityPack is the long-standing standard (XPath-based); AngleSharp is the modern, standards-compliant alternative with CSS selectors and a built-in loader. For JavaScript-rendered pages, Selenium for .NET, PuppeteerSharp, or Playwright for .NET drive a real browser.

Classic parser	HtmlAgilityPack 1.12.x — XPath selectors, very widely used
Modern parser	AngleSharp 1.4.x — HTML5-compliant, CSS selectors, can fetch pages
HTTP client	System.Net.Http.HttpClient (async/await)
JavaScript pages	Selenium .NET, PuppeteerSharp, or Playwright for .NET
Package source	NuGet — HtmlAgilityPack, AngleSharp, Selenium.WebDriver

Your first C# scraper with HtmlAgilityPack

HtmlAgilityPack parses HTML into a tree you query with XPath. Pair it with HttpClient to fetch the page. Install via NuGet: dotnet add package HtmlAgilityPack.

using System;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;

class Program
{
    static async Task Main()
    {
        using var http = new HttpClient();
        http.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");

        string html = await http.GetStringAsync("https://books.toscrape.com/");

        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var books = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
        foreach (var book in books)
        {
            string title = book.SelectSingleNode(".//h3/a").GetAttributeValue("title", "");
            string price = book.SelectSingleNode(".//p[@class='price_color']").InnerText;
            Console.WriteLine($"{title} | {price}");
        }
    }
}

HtmlAgilityPack uses XPath: SelectNodes() returns all matches, SelectSingleNode() one. GetAttributeValue() reads an attribute with a fallback; InnerText reads text. If you prefer CSS selectors, add the HtmlAgilityPack.CssSelectors.NetCore package.

AngleSharp — the modern alternative

AngleSharp is a newer, fully HTML5-compliant parser with CSS selectors and a built-in document loader, so it can fetch and parse in one step. Most competitor guides skip it — it is cleaner for modern code. Install: dotnet add package AngleSharp.

using System;
using System.Threading.Tasks;
using AngleSharp;

class Program
{
    static async Task Main()
    {
        var config = Configuration.Default.WithDefaultLoader();
        var context = BrowsingContext.New(config);
        var doc = await context.OpenAsync("https://books.toscrape.com/");

        foreach (var book in doc.QuerySelectorAll("article.product_pod"))
        {
            string title = book.QuerySelector("h3 a").GetAttribute("title");
            string price = book.QuerySelector(".price_color").TextContent;
            Console.WriteLine($"{title} | {price}");
        }
    }
}

QuerySelectorAll() and QuerySelector() are the same CSS-selector APIs you know from the browser DOM, which makes AngleSharp very natural to use. WithDefaultLoader() enables the HTTP fetch so context.OpenAsync(url) downloads the page for you.

Scraping JavaScript-rendered pages

Neither parser runs JavaScript. For client-side-rendered pages, drive a browser with Selenium for .NET (dotnet add package Selenium.WebDriver). Selenium Manager handles the driver automatically:

using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

var options = new ChromeOptions();
options.AddArgument("--headless=new");
using var driver = new ChromeDriver(options);

driver.Navigate().GoToUrl("https://quotes.toscrape.com/js/");

foreach (var quote in driver.FindElements(By.CssSelector(".quote")))
{
    string text = quote.FindElement(By.CssSelector(".text")).Text;
    string author = quote.FindElement(By.CssSelector(".author")).Text;
    Console.WriteLine($"{author}: {text}");
}
driver.Quit();

PuppeteerSharp and Playwright for .NET are modern alternatives — Playwright in particular has cleaner auto-waiting and multi-browser support, and is the recommended choice for new dynamic-scraping projects in .NET.

Which C# scraping library should you use?

Library	Type	Selectors	Best for
HtmlAgilityPack	Parser	XPath (CSS via add-on)	Static pages, the established choice
AngleSharp	Parser + loader	CSS	Modern HTML5 parsing, clean API
Selenium .NET	Browser automation	CSS/XPath	JavaScript pages, widest docs
PuppeteerSharp	Browser automation	CSS/XPath	Chrome-only headless control
Playwright .NET	Browser automation	CSS/XPath	Modern JS pages, multi-browser

Start with HtmlAgilityPack or AngleSharp for static pages; reach for a browser tool only when the content is JavaScript-rendered.

The hard part: handling anti-bot blocking

The .NET code is the easy part. Sites behind major anti-bot systems block HttpClient on its TLS fingerprint and headers, and they flag headless Selenium on its automation signals. A parser cannot extract data from a challenge page.

Solving this means residential proxy rotation, a real browser fingerprint, and CAPTCHA handling. A scraping API does all of it server-side — your C# code posts the URL and parses the returned HTML with HtmlAgilityPack or AngleSharp:

Code example

csharp

using System;
using System.Net.Http;
using System.Net.Http.Json;
using System.Text.Json;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        using var http = new HttpClient();

        var resp = await http.PostAsJsonAsync(
            "https://api.your-scraping-provider.com/v1?key=YOUR_API_KEY",
            new { cmd = "request.get", url = "https://example.com/protected" });

        var json = await resp.Content.ReadFromJsonAsync<JsonElement>();

        // Fully rendered, unblocked HTML -- hand it to HtmlAgilityPack/AngleSharp.
        string html = json.GetProperty("solution").GetProperty("response").GetString();
        Console.WriteLine(html.Substring(0, 200));
    }
}

Web scraping with Java means fetching a web page over HTTP and extracting structured data from its HTML, usually with Jsoup for static pages…

Web Scraping With Go (Golang): A Complete 2026 Guide

Web scraping with Go (Golang) means using net/http or the Colly framework to fetch pages and goquery to extract data with jQuery-like select…

Web Scraping With Node.js: A Complete 2026 Guide

Web scraping with Node.js means fetching a page (with Axios or the built-in fetch) and parsing it with Cheerio for static sites, or driving …

What Is Selenium?

Selenium is the original cross-browser automation framework — the W3C WebDriver standard predates Puppeteer by a decade. In plain terms, it …

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

XPath for Web Scraping: A Complete 2026 Guide

XPath (XML Path Language) is a query language for selecting nodes in an HTML or XML document, widely used in web scraping to pinpoint the ex…

Web Scraping With PHP: A Complete 2026 Guide

Web scraping with PHP means fetching pages with the Guzzle HTTP client and extracting data with Symfony's DomCrawler component, which suppor…

Concept map

How Web Scraping With C#: A Complete 2026 Guide connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping by Language

Frequently asked questions

What is the best library for web scraping with C#?

HtmlAgilityPack is the established standard (XPath-based) and AngleSharp is the modern, HTML5-compliant alternative with CSS selectors and a built-in page loader. Both are excellent for static pages — choose AngleSharp for cleaner CSS-selector code, HtmlAgilityPack if you prefer XPath or need its huge body of examples. Add Selenium or Playwright for .NET for JavaScript-rendered pages.

Can C# scrape dynamic, JavaScript-heavy sites?

Yes. HtmlAgilityPack and AngleSharp only see server HTML, so for JavaScript-rendered content drive a real browser with Selenium for .NET, PuppeteerSharp, or Playwright for .NET. Playwright is the recommended modern choice. You can also call the page’s underlying JSON API directly with HttpClient, which avoids running a browser entirely.

HtmlAgilityPack or AngleSharp — which should I choose?

Use AngleSharp if you want CSS selectors, strict HTML5 parsing, and a built-in loader that fetches the page for you. Use HtmlAgilityPack if you prefer XPath, need maximum compatibility, or are maintaining existing code. Both are actively used in 2026; AngleSharp tends to feel more modern for new projects.

Why does my C# scraper get blocked, and how do I fix it?

Set a realistic User-Agent, throttle requests, and rotate residential proxies. For sites behind major anti-bot systems you also need a browser-grade TLS fingerprint , which is hard to do from raw HttpClient. Routing those requests through a scraping API that handles proxies, fingerprinting, and challenges server-side is the most reliable approach.

Last updated: 2026-06-08