Markdown for RAG

Clean Markdown and HTML output for RAG and AI agents

Add "markdown": true to a single Scrappey call and get LLM-ready Markdown from JavaScript-heavy and modern websites. No HTML noise, no scripts, no styling junk, just stable structure that drops straight into your chunking, embedding, and retrieval pipeline.

Start your free trial Markdown output overview Scrappey API docs Get a free API key

Quick start

No SDK required. Scrappey is a single HTTP endpoint, so call it from any language with an HTTP client.

Python: install an HTTP client

pip install requests

1
Get your API key
Register for free at app.scrappey.com to get an API key and a free trial. No subscription, no card required.
2
Fetch a page as Markdown
Send the canonical request.get call with markdown set to true. Scrappey handles web access, JavaScript rendering, and managed sessions, then returns clean Markdown for the URL you have the right to access.
bash
```
curl -X POST 'https://publisher.scrappey.com/api/v1?key=YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "cmd": "request.get",
    "url": "https://example.com",
    "markdown": true
  }'
```
3
Chunk and embed
Pass the returned Markdown into your splitter and embedding model, then upsert into your vector store. Stable headings, lists, and tables make chunk boundaries predictable across crawls.

Code examples

Python: fetch Markdown and chunk it for a vector store

import requests

API_KEY = "YOUR_API_KEY"
ENDPOINT = f"https://publisher.scrappey.com/api/v1?key={API_KEY}"

def fetch_markdown(url: str) -> str:
    payload = {
        "cmd": "request.get",
        "url": url,
        "markdown": True,
    }
    res = requests.post(ENDPOINT, json=payload, timeout=180)
    res.raise_for_status()
    data = res.json()
    # Clean, LLM-ready Markdown for the requested page
    return data["solution"]["markdown"]

def chunk(text: str, size: int = 1200, overlap: int = 150):
    chunks, start = [], 0
    while start < len(text):
        end = start + size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

markdown = fetch_markdown("https://example.com")
for i, c in enumerate(chunk(markdown)):
    print(f"--- chunk {i} ({len(c)} chars) ---")
    print(c)
    # embed(c) -> vector_store.upsert(...)

JavaScript: raw fetch returning Markdown for an agent tool

const API_KEY = process.env.SCRAPPEY_API_KEY;
const ENDPOINT = `https://publisher.scrappey.com/api/v1?key=${API_KEY}`;

async function fetchMarkdown(url) {
  const res = await fetch(ENDPOINT, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      cmd: "request.get",
      url,
      markdown: true,
    }),
  });

  const data = await res.json();
  return data.solution.markdown; // LLM-ready Markdown
}

// Use as a retrieval/agent tool
const markdown = await fetchMarkdown("https://example.com");
console.log(markdown);

Why Markdown output for RAG

One parameter, LLM-ready output

Add "markdown": true to the standard request.get call. Scrappey strips scripts, styles, and layout noise and returns clean prose.

Stable structure for better chunking

Preserved headings, lists, and tables mean predictable chunk boundaries and more consistent embeddings across repeated crawls.

Works on modern, JavaScript-heavy sites

Full-browser rendering and automatic web access handling return Markdown from pages where simple HTML-to-Markdown converters come back empty.

HTML when you need it

Omit markdown to get clean HTML instead, so the same endpoint feeds both raw parsers and LLM pipelines.

Pay only for successful requests

A free trial to start, then EUR 0.20 per 1,000 direct requests or EUR 1.00 per 1,000 full-browser requests. Failed requests are free.

Residential proxies included

Managed sessions and residential proxies are bundled on both tiers, so high success rates need no separate proxy billing or setup.

Popular use cases

RAG knowledge bases

Pull clean Markdown from public docs and articles into your retriever for higher-quality chunks and embeddings.

Custom GPTs and Claude Projects

Generate Markdown files to upload directly into Custom GPTs, Claude Projects, or any knowledge base without manual cleanup.

Agent tools

Give an autonomous agent a fetch_markdown tool so it can read any public URL as clean text at runtime.

Framework pipelines

Feed the Markdown into LangChain, LlamaIndex, or an MCP server, then chunk, embed, and store it with your existing loaders.

Markdown for RAG FAQ

How do I get Markdown instead of HTML?

Add "markdown": true to the body of the canonical request.get call. Leave it out and the same endpoint returns clean HTML, so you can switch formats per request.

Does it work with LangChain, LlamaIndex, MCP, and Claude or Codex?

Yes. The Markdown is plain text, so it drops into any framework. Scrappey has dedicated LangChain, LlamaIndex, MCP, and Claude/Codex integration pages that show the same markdown pattern wired into each.

What does the Markdown output cost?

It is billed like any other request: a free trial to start, then EUR 0.20 per 1,000 direct HTTP requests or EUR 1.00 per 1,000 full-browser requests. Markdown is the same rate as HTML, and you only pay for successful requests.

Why use this instead of a local HTML-to-Markdown converter?

Local converters choke on JavaScript-heavy and modern websites and often return empty or broken output. Scrappey renders the page with managed sessions and residential proxies first, then returns stable, LLM-aware Markdown.

Where does the Markdown appear in the response?

In the JSON response under solution.markdown. Parse that field and pass it straight to your splitter and embedding model.

Can I use this for any website?

Use it to collect publicly available data that you have the right to access. Scrappey handles the rendering and web access; you are responsible for the URLs and content you request.

More ways to plug Scrappey into your stack

MCP server Claude & Codex LangChain LlamaIndex n8n Make All integrations

Start building with Scrappey

Try It For Free. No Subscription Required. No Credit Card Required. Instant Set-Up. Your Free Trial Is Waiting For You!

Join our ✨ Discord ✨ community