Your first C# scraper with HtmlAgilityPack
HtmlAgilityPack parses HTML into a tree you query with XPath. Pair it with HttpClient to fetch the page. Install via NuGet: dotnet add package HtmlAgilityPack.
using System;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;
class Program
{
static async Task Main()
{
using var http = new HttpClient();
http.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");
string html = await http.GetStringAsync("https://books.toscrape.com/");
var doc = new HtmlDocument();
doc.LoadHtml(html);
var books = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
foreach (var book in books)
{
string title = book.SelectSingleNode(".//h3/a").GetAttributeValue("title", "");
string price = book.SelectSingleNode(".//p[@class='price_color']").InnerText;
Console.WriteLine($"{title} | {price}");
}
}
}HtmlAgilityPack uses XPath: SelectNodes() returns all matches, SelectSingleNode() one. GetAttributeValue() reads an attribute with a fallback; InnerText reads text. If you prefer CSS selectors, add the HtmlAgilityPack.CssSelectors.NetCore package.
AngleSharp — the modern alternative
AngleSharp is a newer, fully HTML5-compliant parser with CSS selectors and a built-in document loader, so it can fetch and parse in one step. Most competitor guides skip it — it is cleaner for modern code. Install: dotnet add package AngleSharp.
using System;
using System.Threading.Tasks;
using AngleSharp;
class Program
{
static async Task Main()
{
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var doc = await context.OpenAsync("https://books.toscrape.com/");
foreach (var book in doc.QuerySelectorAll("article.product_pod"))
{
string title = book.QuerySelector("h3 a").GetAttribute("title");
string price = book.QuerySelector(".price_color").TextContent;
Console.WriteLine($"{title} | {price}");
}
}
}QuerySelectorAll() and QuerySelector() are the same CSS-selector APIs you know from the browser DOM, which makes AngleSharp very natural to use. WithDefaultLoader() enables the HTTP fetch so context.OpenAsync(url) downloads the page for you.
Scraping JavaScript-rendered pages
Neither parser runs JavaScript. For client-side-rendered pages, drive a browser with Selenium for .NET (dotnet add package Selenium.WebDriver). Selenium Manager handles the driver automatically:
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
var options = new ChromeOptions();
options.AddArgument("--headless=new");
using var driver = new ChromeDriver(options);
driver.Navigate().GoToUrl("https://quotes.toscrape.com/js/");
foreach (var quote in driver.FindElements(By.CssSelector(".quote")))
{
string text = quote.FindElement(By.CssSelector(".text")).Text;
string author = quote.FindElement(By.CssSelector(".author")).Text;
Console.WriteLine($"{author}: {text}");
}
driver.Quit();PuppeteerSharp and Playwright for .NET are modern alternatives — Playwright in particular has cleaner auto-waiting and multi-browser support, and is the recommended choice for new dynamic-scraping projects in .NET.
Which C# scraping library should you use?
| Library | Type | Selectors | Best for |
|---|---|---|---|
| HtmlAgilityPack | Parser | XPath (CSS via add-on) | Static pages, the established choice |
| AngleSharp | Parser + loader | CSS | Modern HTML5 parsing, clean API |
| Selenium .NET | Browser automation | CSS/XPath | JavaScript pages, widest docs |
| PuppeteerSharp | Browser automation | CSS/XPath | Chrome-only headless control |
| Playwright .NET | Browser automation | CSS/XPath | Modern JS pages, multi-browser |
Start with HtmlAgilityPack or AngleSharp for static pages; reach for a browser tool only when the content is JavaScript-rendered.
The hard part: handling anti-bot blocking
The .NET code is the easy part. Sites behind major anti-bot systems block HttpClient on its TLS fingerprint and headers, and they flag headless Selenium on its automation signals. A parser cannot extract data from a challenge page.
Solving this means residential proxy rotation, a real browser fingerprint, and CAPTCHA handling. A scraping API does all of it server-side — your C# code posts the URL and parses the returned HTML with HtmlAgilityPack or AngleSharp: