

@scrappey/langchain is the official Scrappey document loader for LangChain.js. It fetches JavaScript-heavy and modern websites as LLM-ready Markdown, so the output drops straight into a text splitter, vector store, or agent — no local HTML parsing required.
Requires Node 18+. @langchain/core is the only peer dependency.
npm install @scrappey/langchain @langchain/coreRegister at scrappey.com to get your key (150 free requests included), then expose it to your environment. The loader reads SCRAPPEY_API_KEY by default, or you can pass apiKey directly.
export SCRAPPEY_API_KEY="your_api_key"Create a ScrappeyLoader with one or more URLs and call load(). Each URL becomes a LangChain Document whose pageContent is server-side Markdown.
import { ScrappeyLoader } from "@scrappey/langchain";
const loader = new ScrappeyLoader({
urls: ["https://example.com", "https://news.ycombinator.com"],
});
const docs = await loader.load();
console.log(docs[0].pageContent.slice(0, 120));Use lazyLoad() to process documents one at a time as each page lands, so you can embed and persist without buffering the whole batch.
const loader = new ScrappeyLoader({ urls: bigUrlList, concurrency: 2 });
for await (const doc of loader.lazyLoad()) {
// embed / persist each Document as soon as it arrives
}import { ScrappeyLoader } from "@scrappey/langchain";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";
// Scrappey returns LLM-ready Markdown, so it drops straight into a splitter.
const loader = new ScrappeyLoader({
urls: ["https://en.wikipedia.org/wiki/Web_scraping"],
concurrency: 2,
skipOnError: true,
});
const docs = await loader.load();
const splits = await new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 150,
}).splitDocuments(docs);
const store = await MemoryVectorStore.fromDocuments(
splits,
new OpenAIEmbeddings()
);
const hits = await store.similaritySearch("What is web scraping?", 3);
console.log(hits.map((h) => h.metadata.source));# The loader wraps this canonical Scrappey request under the hood.
# Add "markdownResponse": true for LLM-ready Markdown instead of HTML.
curl -X POST "https://publisher.scrappey.com/api/v1?key=YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"cmd": "request.get",
"url": "https://example.com",
"markdownResponse": true
}'ScrappeyLoader implements the standard LangChain loader interface with load() and lazyLoad(), so it plugs into existing chains, splitters, and vector stores.
Pages come back as clean, LLM-ready Markdown converted on Scrappey's side — no local HTML-to-Markdown step before chunking and embedding.
Full-browser rendering with automatic web access handling means content from modern, dynamic websites loads reliably and at high success rates.
Each Document carries metadata like source URL, status code, final URL after redirects, and the session Scrappey used — useful for citations and debugging.
Tune the concurrency option for parallel fetches and set skipOnError to omit failed URLs instead of throwing.
Built on native fetch with @langchain/core as the only peer dependency. Dual ESM + CJS build with first-class TypeScript types.
Load documentation, articles, or knowledge bases as Markdown, chunk them, and embed into a vector store for retrieval-augmented generation.
Give a LangChain agent the ability to pull fresh content from JavaScript-heavy sites the user has the right to access.
Stream a large URL list with lazyLoad() and persist embeddings incrementally for nightly index refreshes.
More ways to plug Scrappey into your stack
Try It For Free. No Subscription Required. No Credit Card Required. Instant Set-Up. 150 Free Requests Are Waiting For You!