The toolchain (free, ~30 minutes to set up)
- Android Studio + AVD. Create a virtual device with API 30+ (Android 11+). Avoid API 28 — rooting scripts do not support it.
- rootAVD (github.com/newbit1/rootAVD). One-command root for the emulator. Confirm Magisk appears in the app drawer afterward.
- HTTP Toolkit (free, httptoolkit.com). Open it → Intercept → "Android device via ADB". It auto-detects the running AVD and grants superuser rights to install its trusted certificate.
- Install the target app via Google Play on the AVD, or sideload the APK from apk.support.
- Use the app while HTTP Toolkit captures. Filter by the target domain. The data endpoints are usually a dozen requests among hundreds — search, listing, detail.
- Replicate in Python. Right-click any captured request → copy as cURL → import into Postman → confirm it returns data → port to curl_cffi.
Step-by-step: intercepting an Android app
Android is the easier of the two platforms because Google ships the system in an open-source form, including emulator images that accept user-installed CA certificates. The full intercept workflow:
- Install Android Studio and create an emulator running an image without Google Play (Play images use Play Integrity attestation and refuse to launch with a custom CA). The plain AOSP system images (API 30+) work without modification.
- Start
mitmproxyon the host:mitmproxy --mode regular --listen-port 8080. Note the host IP visible to the emulator (usually10.0.2.2). - Point the emulator at the proxy: Settings → Network → set proxy to
10.0.2.2:8080. Openmitm.itin the emulator browser, download the Android cert, install it via Settings → Security → User certificates. - Install the target app. For apps that fail at this point, the cause is almost always certificate pinning — the app refuses to talk to a server it doesn't recognise. See the next section.
- Use the app normally. The
mitmproxyconsole shows every request and response. The endpoint, headers, request signing scheme, and pagination model become visible immediately. Common discoveries: GraphQL endpoints, signed JWT auth tokens with hour-long TTLs, unprotected list endpoints with mobile-only headers.
The result of this 30-minute exercise is often a fully-documented API that the company's web stack is paying Akamai $200k/year to protect.
When certificate pinning blocks you — Frida in 5 lines
Roughly half of mainstream apps pin their TLS certificate, meaning the app embeds the expected server certificate hash and refuses to talk to anything else. The proxy CA you just installed is ignored, the app shows a network error, and intercept fails.
Frida is the standard tool to defeat pinning. It hooks into the running app and patches the pinning check at runtime. The community maintains a universal script that works on most apps:
# 1. Root the emulator (or use a Frida-server pre-installed image)
# 2. Start frida-server on the emulator
# 3. On the host, with the app running:
frida -U -l fridantiroot.js -f com.target.app --no-pauseThe script disables both okhttp3.CertificatePinner and javax.net.ssl.TrustManagerFactory hooks. For Flutter apps the pinning is in the Dart layer rather than Java and requires a different script (disable-flutter-tls.js). iOS apps require a jailbroken device or simulator and SSL Kill Switch 2 — the same Frida workflow does not transfer cleanly.
If pinning is implemented in native code (rare, but present in banking apps), Frida alone may not suffice. The escalation path is objection for runtime hooking, or static reverse engineering of the pinning routine. At that point you are spending more on the mobile API than you would on a managed scraping API, and the decision flow says climb back up the ladder.
What to record before disconnecting the proxy
Once you have an intercepted session, document these before the session expires:
- The endpoint path and HTTP method.
- Authentication scheme — Bearer token, signed request, OAuth refresh flow. Note the TTL.
- Request signing — many apps sign requests with an HMAC of the body + a shared secret. The secret is in the app binary and survives across versions.
- Required headers —
X-App-Version,X-Device-ID,X-Build-Number. These look optional but the API often returns 403 without them. - Pagination model — offset/limit vs cursor vs token. Cursor-based pagination from a mobile API is almost always more reliable than offset-based on the web.
- Rate limit — make 20 requests quickly and watch for 429 or a rate-limit header. Mobile APIs often have looser limits than the web equivalent.
Then write the scraper against this documentation, not against the live app. Rotating X-Device-ID per worker, refreshing the auth token before it expires, and respecting the request-signing scheme is enough for most production cases.
Why mobile APIs are softer than the web
Three structural reasons:
- Mobile apps already authenticate. The app ships an API key or signs requests with a per-user token. The backend trusts authenticated requests more than anonymous browser hits, so bot defences are lighter.
- Anti-bot vendors target browsers. Cloudflare, Akamai, and DataDome built their products against headless Chrome and Selenium. Mobile traffic from a real device looks like a real device by default — there is no equivalent product addressing native HTTP clients at scale.
- JS rendering is irrelevant. Mobile APIs return JSON. No HTML, no DOM honeypots, no client-side challenge can fire. The whole browser-fingerprinting category does not apply.
Confirmed in production: a major US retailer's mobile app hits a direct GraphQL endpoint that bypasses the entire web-side Akamai + DataDome stack. Same data, no anti-bot.
When mobile API scraping does not work
SSL pinning. Some apps bind their own SSL certificate and refuse to talk to HTTP Toolkit's trusted cert. Use Frida or objection to bypass at runtime, or use Burp Suite with the Xposed + TrustMeAlready module for a more permanent fix. Banking apps and high-value retailers commonly pin.
Jailbreak detection. Some apps crash on rooted devices. SafetyNet Attestation is the standard mechanism; Magisk Hide / DenyList can usually work around it.
ARM-only apps. The default AVD is x86. Some apps refuse to run on x86 emulators. Either use an arm64 emulator (slower) or a physical device with frida-server installed.
Tokens expire. Most apps refresh tokens on login. Build a token-refresh step into your scraper, not just a single captured token.
