Web Scraping by Language

Web Scraping With C#

On this page

Web scraping with C# means using .NET's HttpClient to fetch a page and a parser like HtmlAgilityPack or AngleSharp to extract data from the HTML. HtmlAgilityPack is the long-standing standard (XPath-based); AngleSharp is the modern, standards-compliant alternative with CSS selectors and a built-in loader. For JavaScript-rendered pages, Selenium for .NET, PuppeteerSharp, or Playwright for .NET drive a real browser.

Quick facts

Classic parserHtmlAgilityPack 1.12.x — XPath selectors, very widely used
Modern parserAngleSharp 1.4.x — HTML5-compliant, CSS selectors, can fetch pages
HTTP clientSystem.Net.Http.HttpClient (async/await)
JavaScript pagesSelenium .NET, PuppeteerSharp, or Playwright for .NET
Package sourceNuGet — HtmlAgilityPack, AngleSharp, Selenium.WebDriver

Your first C# scraper with HtmlAgilityPack

HtmlAgilityPack parses HTML into a tree you query with XPath. Pair it with HttpClient to fetch the page. Install via NuGet: dotnet add package HtmlAgilityPack.

using System;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;

class Program
{
    static async Task Main()
    {
        using var http = new HttpClient();
        http.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");

        string html = await http.GetStringAsync("https://books.toscrape.com/");

        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var books = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
        foreach (var book in books)
        {
            string title = book.SelectSingleNode(".//h3/a").GetAttributeValue("title", "");
            string price = book.SelectSingleNode(".//p[@class='price_color']").InnerText;
            Console.WriteLine($"{title} | {price}");
        }
    }
}

HtmlAgilityPack uses XPath: SelectNodes() returns all matches, SelectSingleNode() one. GetAttributeValue() reads an attribute with a fallback; InnerText reads text. If you prefer CSS selectors, add the HtmlAgilityPack.CssSelectors.NetCore package.

AngleSharp — the modern alternative

AngleSharp is a newer, fully HTML5-compliant parser with CSS selectors and a built-in document loader, so it can fetch and parse in one step. Most competitor guides skip it — it is cleaner for modern code. Install: dotnet add package AngleSharp.

using System;
using System.Threading.Tasks;
using AngleSharp;

class Program
{
    static async Task Main()
    {
        var config = Configuration.Default.WithDefaultLoader();
        var context = BrowsingContext.New(config);
        var doc = await context.OpenAsync("https://books.toscrape.com/");

        foreach (var book in doc.QuerySelectorAll("article.product_pod"))
        {
            string title = book.QuerySelector("h3 a").GetAttribute("title");
            string price = book.QuerySelector(".price_color").TextContent;
            Console.WriteLine($"{title} | {price}");
        }
    }
}

QuerySelectorAll() and QuerySelector() are the same CSS-selector APIs you know from the browser DOM, which makes AngleSharp very natural to use. WithDefaultLoader() enables the HTTP fetch so context.OpenAsync(url) downloads the page for you.

Scraping JavaScript-rendered pages

Neither parser runs JavaScript. For client-side-rendered pages, drive a browser with Selenium for .NET (dotnet add package Selenium.WebDriver). Selenium Manager handles the driver automatically:

using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

var options = new ChromeOptions();
options.AddArgument("--headless=new");
using var driver = new ChromeDriver(options);

driver.Navigate().GoToUrl("https://quotes.toscrape.com/js/");

foreach (var quote in driver.FindElements(By.CssSelector(".quote")))
{
    string text = quote.FindElement(By.CssSelector(".text")).Text;
    string author = quote.FindElement(By.CssSelector(".author")).Text;
    Console.WriteLine($"{author}: {text}");
}
driver.Quit();

PuppeteerSharp and Playwright for .NET are modern alternatives — Playwright in particular has cleaner auto-waiting and multi-browser support, and is the recommended choice for new dynamic-scraping projects in .NET.

Which C# scraping library should you use?

LibraryTypeSelectorsBest for
HtmlAgilityPackParserXPath (CSS via add-on)Static pages, the established choice
AngleSharpParser + loaderCSSModern HTML5 parsing, clean API
Selenium .NETBrowser automationCSS/XPathJavaScript pages, widest docs
PuppeteerSharpBrowser automationCSS/XPathChrome-only headless control
Playwright .NETBrowser automationCSS/XPathModern JS pages, multi-browser

Start with HtmlAgilityPack or AngleSharp for static pages; reach for a browser tool only when the content is JavaScript-rendered.

The hard part: handling anti-bot blocking

The .NET code is the easy part. Sites behind major anti-bot systems block HttpClient on its TLS fingerprint and headers, and they flag headless Selenium on its automation signals. A parser cannot extract data from a challenge page.

Solving this means residential proxy rotation, a real browser fingerprint, and CAPTCHA handling. A scraping API does all of it server-side — your C# code posts the URL and parses the returned HTML with HtmlAgilityPack or AngleSharp:

Code example

csharp
using System;
using System.Net.Http;
using System.Net.Http.Json;
using System.Text.Json;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        using var http = new HttpClient();

        var resp = await http.PostAsJsonAsync(
            "https://api.your-scraping-provider.com/v1?key=YOUR_API_KEY",
            new { cmd = "request.get", url = "https://example.com/protected" });

        var json = await resp.Content.ReadFromJsonAsync<JsonElement>();

        // Fully rendered, unblocked HTML -- hand it to HtmlAgilityPack/AngleSharp.
        string html = json.GetProperty("solution").GetProperty("response").GetString();
        Console.WriteLine(html.Substring(0, 200));
    }
}

Related terms

Concept map

How Web Scraping With C#: A Complete 2026 Guide connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping by Language
Building map…

Frequently asked questions

What is the best library for web scraping with C#?

HtmlAgilityPack is the established standard (XPath-based) and AngleSharp is the modern, HTML5-compliant alternative with CSS selectors and a built-in page loader. Both are excellent for static pages — choose AngleSharp for cleaner CSS-selector code, HtmlAgilityPack if you prefer XPath or need its huge body of examples. Add Selenium or Playwright for .NET for JavaScript-rendered pages.

Can C# scrape dynamic, JavaScript-heavy sites?

Yes. HtmlAgilityPack and AngleSharp only see server HTML, so for JavaScript-rendered content drive a real browser with Selenium for .NET, PuppeteerSharp, or Playwright for .NET. Playwright is the recommended modern choice. You can also call the page’s underlying JSON API directly with HttpClient, which avoids running a browser entirely.

HtmlAgilityPack or AngleSharp — which should I choose?

Use AngleSharp if you want CSS selectors, strict HTML5 parsing, and a built-in loader that fetches the page for you. Use HtmlAgilityPack if you prefer XPath, need maximum compatibility, or are maintaining existing code. Both are actively used in 2026; AngleSharp tends to feel more modern for new projects.

Why does my C# scraper get blocked, and how do I fix it?

Set a realistic User-Agent, throttle requests, and rotate residential proxies. For sites behind major anti-bot systems you also need a browser-grade TLS fingerprint , which is hard to do from raw HttpClient. Routing those requests through a scraping API that handles proxies, fingerprinting, and challenges server-side is the most reliable approach.

Last updated: 2026-06-08