Web Scraping APIs

What Is Firecrawl?

What Is Firecrawl? — conceptual illustration
On this page

Firecrawl is a web-scraping API built for AI: you hand it a URL and it hands back clean Markdown or JSON — no CSS selectors, no XPath, no HTML parsing on your end. It also ships an MCP (Model Context Protocol — a standard way for AI tools to call external services) server, so assistants like Claude, Cursor, and Codex can scrape the web through plain-language requests with zero code. Its FIRE-1 agent navigates JavaScript-heavy sites on its own, and an /interact endpoint clicks buttons and fills forms. The project has 111K+ GitHub stars and is used in production by SAP, Zapier, and Deloitte.

Quick facts

Output formatsMarkdown (default), HTML, JSON, screenshots
Native integrationsLangChain, LlamaIndex, CrewAI, MCP servers
FIRE-1 agentAutonomously navigates multi-page workflows (login, search, paginate)
Free tier500 scrapes/month
Self-hostableYes — MIT-licensed open-source version

Three core endpoints

/scrape — send a URL, get structured content back. By default it returns Markdown, which uses about 67% fewer tokens than raw HTML — a big saving that adds up across a RAG pipeline (RAG = feeding scraped text to an LLM so it can answer from real sources). You can also get JSON by supplying a schema describing the fields you want. It handles JavaScript-heavy sites because a real browser runs behind it.

/crawl — give it a starting URL and it returns every page on the site, with limits on how deep it goes and glob patterns (wildcard URL filters like /docs/*) to narrow the set. Handy for ingesting documentation or building a knowledge base.

/search — runs a web search and fetches the page content in a single call. Built for AI agents that need to ground answers in up-to-date information without stitching together several APIs.

The MCP server

This is Firecrawl's most-used feature in 2026. The MCP server is something you point Claude Code, Cursor, or any MCP client at. The LLM then gets ready-made tools — firecrawl.scrape(url), firecrawl.search(query), firecrawl.crawl(url) — that it can call in plain language. The key shift: your code does not call Firecrawl; the LLM does, on its own, when the user says "scrape this page" or "find me current pricing for X".

For agent-style workflows, this effectively gives any LLM the ability to use the web as a built-in skill. Combine it with Pydantic + Instructor (Python tools that force the model's output to match a defined schema) and you get a production-grade extraction pipeline in just a few lines.

Trade-offs vs alternatives

vs Crawl4AI (open source): Firecrawl is a managed service; Crawl4AI is something you run and maintain yourself. Firecrawl deals with anti-bot defenses and proxies for you; with Crawl4AI you set all of that up. Pick Firecrawl for speed, Crawl4AI when you need full control over your own data.

vs Scrappey or Bright Data: Firecrawl has an opinion about output — it returns Markdown tuned for LLMs. Scrappey returns raw HTML and lets you parse it however you like. For RAG pipelines, Firecrawl saves you the HTML-to-Markdown step. For traditional scraping (pulling specific fields with selectors), Scrappey is more flexible.

vs ScrapeGraphAI: Firecrawl gives you the building blocks; ScrapeGraphAI builds an extraction pipeline for you from a plain-language prompt. They sit at different levels of abstraction.

Code example

python
from firecrawl import FirecrawlApp
from pydantic import BaseModel
import instructor, anthropic

class Product(BaseModel):
    name: str
    price_usd: float
    in_stock: bool

# Step 1: scrape the page into clean Markdown
app = FirecrawlApp(api_key="fc-...")
md = app.scrape_url(
    "https://store.example.com/product/123",
    params={"formats": ["markdown"]},
)["markdown"]

# Step 2: extract typed data with Instructor + Claude
client = instructor.from_anthropic(anthropic.Anthropic())
product = client.messages.create(
    model="claude-sonnet-4-6",
    response_model=Product,
    messages=[{"role": "user", "content": md}],
    max_retries=3,
)
print(product)  # typed object, validated, ready for database

Related terms

Concept map

How Firecrawl connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

How does Firecrawl work with anti-bot systems?

Firecrawl runs its own fleet of managed browsers. A call to /scrape on a URL fronted by Cloudflare or Akamai (services that filter automated traffic) returns clean Markdown for sites you are permitted to access. You do not have to manage proxies or browser configuration yourself.

What is FIRE-1?

FIRE-1 is the Firecrawl agent that works through multi-step tasks on its own: logging in, searching, paging through results, clicking "load more". When the data you want only shows up after some interaction, FIRE-1 does that interaction for you. You describe the goal; the agent figures out the clicks.

Can I self-host Firecrawl?

Yes. The project is open-source under the MIT license, so you can run it on your own servers. The self-hosted version leaves out a few of the managed-cloud features but is fully capable for the core scraping work. Use the hosted version if you want the FIRE-1 agent and the managed anti-bot fleet; self-host when you need to keep data fully in-house.

What is the cost?

The free tier covers 500 scrapes per month. Paid plans start at $19/month for higher volumes. If you are running a RAG pipeline or AI agents at large scale, compare the per-page cost against managed alternatives — Firecrawl is competitive, but not always the cheapest at very high volumes.

Last updated: 2026-05-31