Web Scraping APIs

How to Reverse-Engineer API Requests for Scraping

How to Reverse-Engineer API Requests for Scraping — conceptual illustration
On this page

Reverse-engineering API requests for scraping means inspecting the network traffic a website generates, identifying the JSON endpoints behind the rendered UI, and calling those endpoints directly instead of scraping the rendered HTML. For most modern sites, the API path is dramatically faster, cheaper, and more reliable than browser rendering — you skip the JavaScript, get structured data, and avoid most fingerprint-based blocking.

Quick facts

WorkflowOpen DevTools → Network → reproduce the action → filter for XHR/fetch
Look forJSON responses, GraphQL queries, structured pagination cursors
Always copyFull URL, all headers, body — replicate exactly first, simplify after
Watch forCSRF tokens, signed query params, dynamic auth headers
When it failsEncrypted bodies, attestation tokens, mobile-only endpoints

The basic workflow

Open DevTools, switch to the Network tab, filter to Fetch/XHR. Perform the user action you want to scrape (load a page, scroll, search). Look through the requests for ones returning structured JSON containing the data you want. Right-click the request and "Copy as cURL" — you now have a known-good baseline. Paste into a script, confirm it works, then start removing headers one by one to find the minimum required set.

Handling auth and CSRF

Most internal APIs require either a session cookie, a CSRF token from the initial page, or an auth header. Session cookies: hit the public page first, capture the cookie, reuse it. CSRF tokens: parse the token out of the initial HTML (usually a meta tag or a hidden form input), include it in subsequent API calls. Bearer tokens: log in once via the public flow, capture the token, refresh as needed.

When reverse-engineering fails

Some endpoints sign requests with HMAC computed in obfuscated JS, attach device-attestation tokens that require running the page's JS VM, or are only available to the mobile app via TLS pinning. In those cases the cost of reverse-engineering exceeds the cost of rendering the page in a real browser — fall back to that. Mobile API endpoints are a separate category and usually need MITM proxy work (Mitmproxy, Charles) on a real device.

Code example

python
import requests, re

s = requests.Session()
home = s.get('https://example.com/')
csrf = re.search(r'name="csrf" content="([^"]+)"', home.text).group(1)

api = s.get('https://example.com/api/v1/products', params={
    'page': 1, 'limit': 50
}, headers={'X-CSRF-Token': csrf})
data = api.json()

Related terms

Concept map

How How to Reverse-Engineer API Requests for Scraping connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Is reverse-engineering APIs legal?

Calling a public-facing internal API is the same as making the same request a browser would. The legal questions are about what you do with the data, not the act of fetching it. Stay clear of authenticated endpoints you do not have access to.

How do I know if a site uses GraphQL?

Look for requests to a single endpoint (often <code>/graphql</code>) with POST bodies containing <code>query</code> and <code>variables</code>. The same endpoint serves every data type.

What if the API request body is encrypted?

Some sites encrypt the body with a key derived from page-side JavaScript. Either reverse-engineer the key derivation (hours to days of JS work) or fall back to browser rendering — usually the latter is cheaper.

Last updated: 2026-05-26