Web Scraping APIs

What Is jsoup? (Java HTML Parser)

On this page

jsoup is a free Java library that reads HTML and lets you pull data out of it. You give it a web page, and it turns the raw HTML into a DOM (the tree of elements that makes up a page). From there you can find elements with CSS selectors - the same div.price style patterns you use in stylesheets - and grab their text or attributes. It is the go-to HTML parser for Java web scraping, just as Beautiful Soup is for Python.

Quick facts

LanguageJava
PurposeHTML parsing + extraction via CSS selectors
AnalogyJava's Beautiful Soup
Renders JavaScript?No - static HTML only
Best forServer-rendered pages on JVM stacks

What jsoup does

jsoup takes HTML from a string, file, or URL and builds a clean DOM (the element tree of the page). You then query it with jQuery-style selectors: doc.select("div.price") returns every matching element, and you read each one's text or attributes. It also cleans up broken markup and can sanitize untrusted HTML - stripping out tags that could be unsafe. For pulling data from static pages, it is short to write and fast to run.

When to use jsoup

Use jsoup when you are working in Java (or another JVM language) and scraping pages whose content is already in the HTML the server sends - product listings, articles, tables. It is a great fit for small-to-medium jobs where the data is right there in the initial HTML. It is not the right tool for JavaScript-heavy single-page apps, because jsoup does not run JavaScript - it only reads the HTML as delivered.

jsoup's limits for modern scraping

jsoup only parses HTML. It does not run JavaScript, rotate proxies (swap the IP address each request comes from), or handle anti-bot defenses. On a protected site you will receive a 403 or a Cloudflare challenge page instead of the real content, and on a single-page app you will get an empty shell with no data. The usual fix is to first fetch fully rendered HTML through a web scraping API, then hand that HTML to jsoup to parse.

Related terms

Concept map

How jsoup connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is jsoup like Beautiful Soup?

Yes - jsoup is basically the Java version of Python's Beautiful Soup: an HTML parser that lets you extract data using CSS selectors.

Can jsoup scrape JavaScript-rendered pages?

No. jsoup reads static HTML and does not run JavaScript, so content that a page builds with JavaScript (like single-page apps) never shows up. Pair it with a headless browser or a scraping API that returns the fully rendered HTML.

Is jsoup free?

Yes, it is open source under the MIT license.

Can jsoup handle Cloudflare or anti-bot systems?

No - it has no proxy or anti-bot handling of its own. Send your requests through proxies or a scraping API first, then parse the returned HTML with jsoup.

Last updated: 2026-05-31