How to Reverse-Engineer API Requests for Scraping

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

How to Reverse-Engineer API Requests for Scraping — conceptual illustration

On this page

Reverse-engineering API requests for scraping means watching the network traffic a website makes, spotting the JSON endpoints that feed its visible UI, and calling those endpoints directly instead of scraping the rendered HTML. An API (Application Programming Interface) is the set of data requests a site supports; the JSON it returns is clean, structured data. For most modern sites this API path is dramatically faster, cheaper, and more reliable than running a browser — you skip the JavaScript, get structured data, and avoid most fingerprint-based blocking (where a site identifies and blocks automated clients by their technical traits).

Workflow	Open DevTools → Network → reproduce the action → filter for XHR/fetch
Look for	JSON responses, GraphQL queries, structured pagination cursors
Always copy	Full URL, all headers, body — replicate exactly first, simplify after
Watch for	CSRF tokens, signed query params, dynamic auth headers
When it fails	Encrypted bodies, attestation tokens, mobile-only endpoints

The basic workflow

Open DevTools (your browser's built-in developer panel, usually F12) and switch to the Network tab, then filter to Fetch/XHR — these are the background data requests the page makes. Now do the action you want to scrape: load a page, scroll, run a search. Scan the requests for ones that return structured JSON containing the data you want. Right-click that request and choose "Copy as cURL" (cURL is a command-line tool for making HTTP requests) — you now have a known-good copy. Paste it into a script, confirm it works, then remove headers one by one to find the minimum set the server actually needs.

Handling auth and CSRF

Most internal APIs want proof of who you are: usually a session cookie (a token tying requests to your login), a CSRF token from the initial page (a one-time value that proves the request came from the real site, not a forgery), or an auth header. Session cookies: load the public page first, grab the cookie, reuse it. CSRF tokens: pull the token out of the initial HTML (usually a meta tag or a hidden form input) and include it in later API calls. Bearer tokens: log in once through the normal flow, capture the token, and refresh it as needed.

When reverse-engineering fails

Some endpoints fight back. They might sign each request with an HMAC (a tamper-proof checksum) computed in deliberately scrambled, or obfuscated, JavaScript; attach device-attestation tokens that only exist if you actually run the page's JS; or only serve the mobile app, locked down with TLS pinning (where the app refuses any https connection it does not specifically trust). In those cases the effort of reverse-engineering outweighs just rendering the page in a real browser — so fall back to that. Mobile API endpoints are their own category and usually need MITM proxy work — sitting between the app and the server to inspect traffic — using a tool like Mitmproxy or Charles on a real device.

Code example

python

import requests, re

s = requests.Session()
home = s.get('https://example.com/')
csrf = re.search(r'name="csrf" content="([^"]+)"', home.text).group(1)

api = s.get('https://example.com/api/v1/products', params={
    'page': 1, 'limit': 50
}, headers={'X-CSRF-Token': csrf})
data = api.json()

Related terms

What Is a Web Scraping API?

A web scraping API is a hosted HTTP service that visits a web page for you and hands back the result — rendered HTML, JSON, or already-parse…

How to Scrape Infinite-Scroll Pages

Infinite scroll is the page design where new content keeps loading on its own as you scroll down (like a social feed that never ends). To sc…

What Is Mobile API Scraping?

Mobile API scraping means watching the traffic a vendor's phone app sends to its servers, then making those same requests yourself from Pyth…

How to Get All Links From a Webpage

Getting all links from a webpage means downloading the page, reading every <a href> attribute (the URL inside each link tag), turning relati…

What Is Scrapy?

Scrapy is the industry-default crawler framework for Python. It does everything around the actual HTTP request so you don't have to: it keep…

What Is Web Scraping as a Service?

Web scraping as a service (WSaaS) is a managed, cloud-based offering that handles web data extraction for you through an API or dashboard - …

What Is Lua Bytecode Virtualization?

Lua bytecode virtualization is an obfuscation technique that replaces Lua's standard virtual machine with a custom, secret one, so the compi…

What Is Dynamic IAT Resolution (Import Hashing)?

Dynamic IAT resolution (import hashing) is an anti-analysis technique where a binary hides which OS APIs it uses by resolving them at runtim…

Concept map

How How to Reverse-Engineer API Requests for Scraping connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Web Scraping APIs

Tools & solutions for this topic

Frequently asked questions

Is reverse-engineering APIs legal?

Calling a public-facing internal API is the same as making the request a browser would already make. The legal questions are about what you do with the data, not the act of fetching it. Stay clear of authenticated endpoints you do not have access to.

How do I know if a site uses GraphQL?

GraphQL is a query style where every data type is served from one endpoint. Look for requests to a single URL (often /graphql) with POST bodies that contain query and variables fields — that same endpoint answers every kind of request.

What if the API request body is encrypted?

Some sites encrypt the request body with a key generated by their page-side JavaScript. You can either reverse-engineer how that key is built (hours to days of JS work) or fall back to browser rendering — usually the latter is cheaper.

Last updated: 2026-05-31