Three core endpoints
/scrape — URL in, structured content out. Returns Markdown by default (about 67% fewer tokens than raw HTML, which compounds significantly for RAG pipelines). Optional JSON extraction via a schema you provide. Handles JavaScript-heavy sites because there is a managed browser behind it.
/crawl — give it a starting URL, get all pages on the site (depth-limited, glob-filtered). Useful for documentation ingestion and knowledge-base building.
/search — web search with content retrieval in one call. Designed for AI agents that need to ground answers in current information without orchestrating multiple APIs.
The MCP server
The most-used feature in 2026. Firecrawl ships an MCP server you point Claude Code, Cursor, or any MCP client at. The LLM gets tools like firecrawl.scrape(url), firecrawl.search(query), firecrawl.crawl(url) as natural-language tool calls. Your code does not invoke Firecrawl — the LLM does, when the user asks "scrape this page" or "find me current pricing for X".
For agentic workflows this turns the web into a first-class capability for any LLM. Pair it with Pydantic + Instructor for schema-validated extraction and you have a production-grade pipeline in a handful of lines.
Trade-offs vs alternatives
vs Crawl4AI (open source): Firecrawl is managed; Crawl4AI you run yourself. Firecrawl handles anti-bot and proxies for you; Crawl4AI you wire up yourself. Use Firecrawl for speed and Crawl4AI for full data sovereignty.
vs Scrappey or Bright Data: Firecrawl is opinionated about output format (Markdown-first, LLM-optimised). Scrappey returns HTML and lets you parse however you want. For RAG pipelines, Firecrawl saves you the HTML-to-Markdown step. For traditional scraping (structured extraction with selectors), Scrappey is more flexible.
vs ScrapeGraphAI: Firecrawl gives you primitives; ScrapeGraphAI builds an extraction graph from a natural-language prompt. Different abstraction levels.
