Web Scraping APIs

Is Web Scraping Legal?

On this page

Scraping publicly available data is generally legal, but legality depends on what you collect, how you collect it, and what you do with it — not on web scraping as an activity in itself. Courts in several jurisdictions have repeatedly found that accessing information a website makes public does not, on its own, break the law. The risk lives in the details: collecting personal data, ignoring a site's Terms of Service, copying copyrighted content, or hammering a server hard enough to disrupt it can each create liability even when the underlying scraping is fine.

Not legal advice
This is a plain-English overview for developers, not legal advice. Laws differ by country and change over time — consult a qualified lawyer for your specific use case.

Quick facts

Short answerPublic data: generally legal. Depends on what & how.
Biggest risk areasPersonal data, copyright, ToS, server load
Key US casehiQ Labs case — scraping public data not a CFAA violation
Key EU lawGDPR — personal data has obligations even when public
Safe defaultPublic, non-personal data; honor robots.txt & rate limits

The short answer

There is no law called "the web scraping law." Web scraping is automated reading of web pages, and reading public information is not illegal. What can be illegal is a specific combination of facts around a scrape. The four questions that actually decide legality are:

  1. Is the data public, or behind a login / paywall you had to break through?
  2. Does it contain personal data about identifiable people?
  3. Is the content copyrighted, and are you republishing it?
  4. Did you agree to Terms that prohibit scraping, and did your scraper harm the site?

Get those right and most scraping of public, non-personal data sits comfortably on the legal side. Get them wrong and even "just reading a page" can turn into a contract, privacy, or copyright problem.

United States: the CFAA and "public" data

The headline US statute people worry about is the Computer Fraud and Abuse Act (CFAA), which criminalizes accessing a computer "without authorization." The key question has been whether scraping a public website counts.

In the landmark hiQ Labs case, the Ninth Circuit held that scraping data a site makes publicly available (no login required) does not violate the CFAA — there's no "authorization" to exceed when the data is open to everyone. The Supreme Court's Van Buren v. United States decision narrowed the CFAA in a compatible direction, focusing it on cases where someone reaches an access control they were not authorized to cross, rather than on violating usage policies.

The practical takeaway: public means public. Data behind a password, paywall, or technical access control is a different story — breaking through an access gate is where CFAA exposure starts. Note that hiQ ultimately lost on contract grounds (breach of the site's Terms), which is exactly why the "how" matters as much as the "what."

Personal data: GDPR and CCPA

The fact that personal data is visible on a public page does not make it free to collect and store. Under the EU/UK GDPR, processing personal data (names, emails, profiles) requires a lawful basis, and data subjects have rights regardless of where the data was found. The US CCPA/CPRA imposes similar obligations in California.

  • Aggregating public personal data at scale is one of the most litigated and regulated areas of scraping.
  • Non-personal data — prices, product specs, sports scores, public filings — carries far less privacy risk.

If your scrape can avoid personal data entirely, it sidesteps the single largest category of legal risk. When you do need it, document a lawful basis and minimize what you keep.

Terms of Service, robots.txt and server load

Even when statutes don't bite, a site's Terms of Service can. If you clicked "I agree" or accessed an area gated by terms that ban automated collection, scraping may be a breach of contract — the ground hiQ actually lost on. Anonymous access to a fully public page is a weaker basis for a ToS claim, but the safest path is simply to respect the rules you're on notice of.

Two technical courtesies also reduce both legal and practical risk:

  • Honor robots.txt and published crawl guidance.
  • Rate-limit yourself. A scraper that degrades a site's service can move you from "reading public data" toward trespass-to-chattels or computer-misuse territory. Polite, paced requests matter — both legally and so you don't get 429-rate-limited or blocked.

A practical checklist for staying on the right side

None of this is legal advice, but these habits keep most scraping projects defensible:

  • Scrape public data — don't break through logins, paywalls, or access controls.
  • Avoid or minimize personal data; if you must collect it, have a lawful basis and a retention limit.
  • Use facts, not verbatim creative content; transform rather than republish.
  • Respect robots.txt and Terms you're genuinely on notice of.
  • Rate-limit and identify yourself where appropriate; never disrupt the target's service.
  • Check the laws of your jurisdiction and the target's — and ask a lawyer for anything high-stakes.

Tools like a managed web scraping API help on the how by pacing requests and managing infrastructure for publicly accessible data — but the legal responsibility for what you collect and how you use it always stays with you.

Related terms

What Is Web Scraping?
Web scraping is the automated extraction of structured data from websites. Instead of a person copying and pasting, a program (a "scraper") …
What Is a Web Scraping API?
A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…
What Is Polite Crawling?
Polite crawling means running your crawler at a speed and rhythm that won't strain the websites it visits. In practice that means obeying ro…
What's the Difference Between Web Crawling and Scraping? (2026 Guide)
Crawling and scraping are two different jobs that often work together. Crawling is how you find pages: a program follows links from page to …
What are the best practices for web scraping? (2026 Guide)
Best practices for web scraping are the habits that keep your scraper reliable, polite to the sites you collect from, and unlikely to get yo…
What Is the robots.txt Protocol?
robots.txt is a plain-text file at the root of a website (/robots.txt) that tells crawlers which paths they should and should not fetch. Thi…
How to Scrape Prices: Build a Price Monitor That Survives Anti-Bot
To scrape prices reliably you fetch each product page through a residential proxy in the right country, parse the current price out of the p…
List Crawling in Web Scraping
List crawling is the technique of crawling paginated list, category, or index pages to enumerate the URLs of individual items, then fetching…
Web Scraping for LLMs and RAG
Web scraping for LLMs is the process of fetching web pages and converting them into clean, chunkable text (usually Markdown) that can be emb…
What Is Rate Limiting?
Rate limiting is a control that caps how many requests a single client can make to a server within a fixed time window. A site might allow 6…
What Is a Web Unblocker?
A web unblocker is a managed service that sits between your scraper and a target site, automatically handling the proxies, browser rendering…

Concept map

How Is Web Scraping Legal connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is web scraping legal?

Scraping publicly available, non-personal data is generally legal in most jurisdictions, and courts (e.g. the hiQ Labs case in the US) have held that accessing public data does not by itself violate computer-access laws like the CFAA. Legality depends on what you collect (avoid personal and copyrighted data), how you collect it (do not break through logins or overload servers), and whether you respect the site’s Terms of Service. This is not legal advice.

Is it legal to scrape data behind a login?

It is much riskier. Breaking through a password, paywall, or other access control is where laws like the US CFAA come into play, because you are accessing data that is not public. Public pages that require no login are far safer to scrape than anything gated behind authentication.

Can I get sued for web scraping?

Yes, even when no criminal law is broken. The common civil claims are breach of contract (violating a site’s Terms of Service), copyright infringement (republishing creative content), privacy violations (mishandling personal data under GDPR/CCPA), and trespass-to-chattels (overloading a server). Scraping public, non-personal facts politely avoids most of these.

Does robots.txt make scraping illegal if I ignore it?

robots.txt is a voluntary standard, not a law, so ignoring it is not automatically illegal. But it signals the site owner’s wishes, can support a Terms-of-Service or trespass claim, and ignoring it often leads to your traffic being blocked. Honoring robots.txt and rate-limiting your requests is the safer, more sustainable approach.

Is scraping personal data legal?

Personal data being publicly visible does not make it free to collect. Under GDPR (EU/UK) and CCPA/CPRA (California), processing personal data carries legal obligations regardless of where it was found. Aggregating public personal data at scale is one of the most regulated and litigated areas of scraping — minimize personal data or get legal advice before collecting it.

Last updated: 2026-06-08