Why the order matters
The steps get harder as you go down, so the earlier you stop the easier your life is. Step 1 (mobile API) hands you clean JSON over a relaxed HTTP endpoint, and the only price is an afternoon learning HTTPToolkit (a tool that lets you watch an app's network traffic). Step 2 (XHR — the background requests a page makes to fetch data) also gives you JSON, but from an endpoint that might be guarded. Step 3 (JSON-in-HTML) is the same data as a plain string you parse, with no browser at all. Steps 4-6 each pile on more infrastructure and budget.
The cost ladder is real. Step 4 needs residential proxies (~$3–10/GB). Step 5 needs a patched-browser binary plus 200MB RAM per instance plus proxies. Step 6 is per-request pricing on managed APIs ($0.20–$3 per 1,000). Starting at step 5 when step 1 would have worked is a recurring waste of engineering time — but it is common when teams don't consciously walk the flow.
Step-by-step walkthrough
Step 0 — Recon. Before anything, identify the stack. Install Wappalyzer (a Chrome extension) and visit the target; it names the anti-bot vendor in one click. Or run wafw00f https://target.com from the command line. With Burp Suite MCP attached to Claude Code, one prompt traces the cookie lifecycle and recommends which step to use.
Step 1 — Mobile API. Run the app inside a rooted Android Studio emulator (AVD) and capture its traffic with HTTPToolkit. The mobile app often talks to a separate backend with a different configuration. For example, a retailer's mobile app may use a direct GraphQL endpoint that is served by a different backend than the web frontend's Akamai + DataDome stack.
Step 2 — XHR. Open Chrome DevTools → Network → Fetch/XHR. Many single-page apps load everything from one undocumented JSON endpoint you can request directly.
Step 3 — JSON in HTML. Many sites ship their data right inside the page source. Next.js sites embed full state in __NEXT_DATA__; React SPAs often expose window.__INITIAL_STATE__. For example, some product pages ship 100KB+ of product data in __NEXT_DATA__, which can be read directly because no JS executes.
Step 4 — HTTP + curl_cffi. Send plain HTTP requests with a TLS handshake (the encryption setup behind https) that matches a real browser via impersonate="chrome131", plus a residential proxy. This works for many targets where server-side scoring is light.
Step 5 — Patched browser. A real browser configured for a consistent fingerprint: Camoufox, CloakBrowser, and PatchRight. Each addresses a specific layer (canvas/WebGL, extension probes, or function-source inspection) that JS-level runtime patching cannot reach.
Step 6 — Managed API. Hand the problem to a paid service. This is common for sites with a custom JS VM such as F5 Shape, where a DIY approach is impractical. Once you are spending more than ~2 engineer-days/month on maintenance, the managed API is cheaper than the engineer.
Cost progression — when to escalate
| Step | Cost | Maintenance burden |
|---|---|---|
| 1 — Mobile API | Free | Low (token refresh) |
| 2 — XHR / GraphQL | Free | Low–medium |
| 3 — JSON-in-HTML | Free | Low |
| 4 — HTTP + curl_cffi | Proxy only (~$2–10/GB residential) | Medium (TLS profile rotation) |
| 5 — Patched browser | Proxy + 200MB RAM/instance | Medium–high (per-target tuning) |
| 6 — Managed API | $0.20–$3 per 1,000 requests | Zero |
