What Is Firecrawl?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is Firecrawl? — conceptual illustration

On this page

Firecrawl is a web-scraping API built for AI: you hand it a URL and it hands back clean Markdown or JSON — no CSS selectors, no XPath, no HTML parsing on your end. It also ships an MCP (Model Context Protocol — a standard way for AI tools to call external services) server, so assistants like Claude, Cursor, and Codex can scrape the web through plain-language requests with zero code. Its FIRE-1 agent navigates JavaScript-heavy sites on its own, and an /interact endpoint clicks buttons and fills forms. The project has 130K+ GitHub stars (verified June 2026) and is used in production by teams including Zapier.

Output formats	Markdown (default), HTML, JSON, screenshots
Native integrations	LangChain, LlamaIndex, CrewAI, MCP servers
FIRE-1 agent	Autonomously navigates multi-page workflows (login, search, paginate)
Free tier	1,000 credits/month (≈1,000 page scrapes)
Self-hostable	Yes — server is AGPL-3.0; SDKs & UI are MIT

Three core endpoints

/scrape — send a URL, get structured content back. By default it returns Markdown, which uses about 67% fewer tokens than raw HTML — a big saving that adds up across a RAG pipeline (RAG = feeding scraped text to an LLM so it can answer from real sources). You can also get JSON by supplying a schema describing the fields you want. It handles JavaScript-heavy sites because a real browser runs behind it.

/crawl — give it a starting URL and it returns every page on the site, with limits on how deep it goes and glob patterns (wildcard URL filters like /docs/*) to narrow the set. Handy for ingesting documentation or building a knowledge base.

/search — runs a web search and fetches the page content in a single call. Built for AI agents that need to ground answers in up-to-date information without stitching together several APIs.

The MCP server

This is Firecrawl's most-used feature in 2026. The MCP server is something you point Claude Code, Cursor, or any MCP client at. The LLM then gets ready-made tools — firecrawl.scrape(url), firecrawl.search(query), firecrawl.crawl(url) — that it can call in plain language. The key shift: your code does not call Firecrawl; the LLM does, on its own, when the user says "scrape this page" or "find me current pricing for X".

For agent-style workflows, this effectively gives any LLM the ability to use the web as a built-in skill. Combine it with Pydantic + Instructor (Python tools that force the model's output to match a defined schema) and you get a production-grade extraction pipeline in just a few lines.

Trade-offs vs alternatives

vs Crawl4AI (open source): Firecrawl is a managed service; Crawl4AI is something you run and maintain yourself. Firecrawl deals with anti-bot defenses and proxies for you; with Crawl4AI you set all of that up. Pick Firecrawl for speed, Crawl4AI when you need full control over your own data.

vs Scrappey or Bright Data: Firecrawl has an opinion about output — it returns Markdown tuned for LLMs. Scrappey returns raw HTML and lets you parse it however you like. For RAG pipelines, Firecrawl saves you the HTML-to-Markdown step. For traditional scraping (pulling specific fields with selectors), Scrappey is more flexible.

vs ScrapeGraphAI: Firecrawl gives you the building blocks; ScrapeGraphAI builds an extraction pipeline for you from a plain-language prompt. They sit at different levels of abstraction.

Code example

python

from firecrawl import FirecrawlApp
from pydantic import BaseModel
import instructor, anthropic

class Product(BaseModel):
    name: str
    price_usd: float
    in_stock: bool

# Step 1: scrape the page into clean Markdown
app = FirecrawlApp(api_key="fc-...")
md = app.scrape_url(
    "https://store.example.com/product/123",
    params={"formats": ["markdown"]},
)["markdown"]

# Step 2: extract typed data with Instructor + Claude
client = instructor.from_anthropic(anthropic.Anthropic())
product = client.messages.create(
    model="claude-sonnet-4-6",
    response_model=Product,
    messages=[{"role": "user", "content": md}],
    max_retries=3,
)
print(product)  # typed object, validated, ready for database

MCP config - let Claude or Cursor scrape natively

json

{
  "mcpServers": {
    "firecrawl": {
      "command": "npx",
      "args": ["-y", "firecrawl-mcp"],
      "env": { "FIRECRAWL_API_KEY": "fc-..." }
    }
  }
}

Crawl a whole docs site to Markdown

python

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-...")

# Crawl every page under /docs and get Markdown for each.
job = app.crawl_url(
    "https://docs.example.com",
    params={
        "includePaths": ["/docs/.*"],
        "limit": 200,
        "scrapeOptions": {"formats": ["markdown"]},
    },
)
for page in job["data"]:
    print(page["metadata"]["sourceURL"])

Related terms

What Is AI Web Scraping?

AI web scraping is an approach that replaces CSS selectors with natural-language prompts, LLM-based extraction, and Markdown-first output. N…

What Is Schema-Validated LLM Extraction?

Schema-validated LLM extraction is the standard production pattern for AI scraping: you describe the data you want as a Pydantic schema (a P…

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

What Is Web Scraping?

Web scraping is the automated extraction of structured data from websites. Instead of a person copying and pasting, a program (a "scraper") …

What Is Crawl4AI?

Crawl4AI is the most-starred open-source LLM-friendly web crawler on GitHub — 60K+ stars under Apache 2.0 license, maintained by UncleCode. …

What Is an MCP Server for Scraping?

An MCP server for scraping is a Model Context Protocol endpoint that exposes scraping tools (fetch, screenshot, parse, search) as callable f…

Web Scraping Tools 2026 — A Comparison

"Web scraping tools" is the whole family of software you use to pull data off websites — and in 2026 that family is big but neatly sorted in…

Best Web Scraping API for LLM Training Data

The best web scraping API for LLM training data delivers clean, deduplicated, license-aware text at the scale training pipelines need — boil…

Best Scraping API for News Monitoring

The best scraping API for news monitoring reliably pulls a structured headline, full article body, byline, publish date, and source name fro…

Crawl4AI vs Firecrawl: Which to Pick

Crawl4AI and Firecrawl both turn a URL into clean Markdown for LLMs, but they sit on opposite ends of the build-vs-buy line: Crawl4AI is a f…

Concept map

How Firecrawl connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

How does Firecrawl work with anti-bot systems?

Firecrawl runs its own fleet of managed browsers. A call to /scrape on a URL fronted by Cloudflare or Akamai (services that filter automated traffic) returns clean Markdown for sites you are permitted to access. You do not have to manage proxies or browser configuration yourself.

What is FIRE-1?

FIRE-1 is the Firecrawl agent that works through multi-step tasks on its own: logging in, searching, paging through results, clicking "load more". When the data you want only shows up after some interaction, FIRE-1 does that interaction for you. You describe the goal; the agent figures out the clicks.

Can I self-host Firecrawl?

Yes. Firecrawl's server is open-source under the AGPL-3.0 license (its SDKs and some UI components are MIT), so you can run it on your own servers — note that AGPL means if you offer a modified version as a network service you must publish your source changes. The self-hosted version leaves out a few of the managed-cloud features but is fully capable for the core scraping work. Use the hosted version if you want the FIRE-1 agent and the managed anti-bot fleet; self-host when you need to keep data fully in-house.

What is the cost?

The free tier covers 1,000 credits per month (one credit ≈ one page scrape). Paid plans start around $16/month (the Hobby tier, billed yearly) and scale up by volume. If you are running a RAG pipeline or AI agents at large scale, compare the per-page cost against managed alternatives — Firecrawl is competitive, but not always the cheapest at very high volumes.

Last updated: 2026-06-16 · Facts last verified: 2026-06-16