Web Scraping APIs

What Is mitmproxy?

What Is mitmproxy? — conceptual illustration
On this page

mitmproxy is a Python-scriptable HTTPS intercepting proxy used for mobile API discovery, request replay, and reverse engineering authentication flows. It runs as a CLI (mitmproxy), a TUI (mitmweb), or a headless replay engine (mitmdump), and accepts inline Python scripts that can rewrite, log, or replay any request in flight. In scraping it is the default reconnaissance tool — the first step of the scraping decision flow is "intercept the mobile app first", and mitmproxy is how.

Quick facts

Vendormitmproxy project (open-source, MIT)
LanguagePython (server); scripts in Python
ModesCLI (mitmproxy), web UI (mitmweb), headless replay (mitmdump)
Use case in scrapingMobile API discovery, request inspection, replay & rewrite
LimitationCertificate pinning — many apps refuse the mitmproxy CA without Frida bypass

What mitmproxy is for

Two primary uses in scraping:

  1. Mobile API discovery. Install the mitmproxy CA on an Android emulator or jailbroken iPhone, point the device proxy at mitmproxy, and use the target app. Every request becomes visible — endpoints, auth tokens, request signing schemes, pagination models. This is how scrapers discover unprotected mobile backends behind sites that pay Akamai for web protection.
  2. Web request inspection and replay. When debugging a flaky scraper, route it through mitmproxy and replay individual requests with header tweaks (the r key opens a request editor). Combined with the inline Python scripting API, you can rewrite requests on the fly without touching the scraper code.

mitmweb (the browser UI) is friendlier for one-off use; mitmproxy (the keyboard-driven TUI) is faster once learned; mitmdump is headless and useful in CI or scripted captures.

mitmproxy vs HTTP Toolkit vs Charles Proxy vs Burp Suite

Four tools cover the intercepting-proxy category, with overlapping use cases:

ToolBest forCost
mitmproxyCLI/scripting, automation, repeatable capturesFree
HTTP ToolkitGUI-driven mobile intercept; one-click device setupFree + Pro ($10/mo)
Charles ProxyVeteran GUI, polished macOS experience$50 one-time
Burp SuiteSecurity recon, intruder/repeater, MCP serverFree / Pro $475/yr

For scraping reconnaissance specifically, mitmproxy is the default — free, scriptable, and unambiguously focused on the intercept-and-replay loop. Burp Suite's feature set overlaps but it's aimed at pen-testing and the price tag reflects that.

The certificate-pinning wall

Roughly half of mainstream mobile apps pin their TLS certificates — they ship with the expected server certificate hash baked in and refuse to talk to any other certificate. The mitmproxy CA you installed becomes invisible to the app, which shows a network error.

Three escalation steps when pinning blocks you:

  1. Try a different app version. Older versions of the same app frequently omit pinning. Sideload an APK from a few releases back via apkpure or similar.
  2. Frida + universal pinning bypass. frida-server on the device + fridantiroot.js on the host disables both okhttp3.CertificatePinner and the Java TrustManagerFactory. Works against most apps. See the mobile API scraping playbook for the full workflow.
  3. objection / static reverse engineering. For native-code pinning (banking apps, some games), Frida's default scripts aren't enough. objection covers more cases; novel pinning requires disassembly. At this point you're spending more on intercept than the scraping is worth.

Code example

python
# inline mitmproxy script — extract auth tokens and pagination cursors
# save as tokens.py, run with: mitmproxy -s tokens.py
from mitmproxy import http
import json

class TokenExtractor:
    def __init__(self):
        self.tokens = {}

    def response(self, flow: http.HTTPFlow) -> None:
        # Capture bearer tokens from any login endpoint
        if "/login" in flow.request.path and flow.response.status_code == 200:
            try:
                body = json.loads(flow.response.text)
                if "access_token" in body:
                    self.tokens[flow.request.host] = body["access_token"]
                    print(f"captured token for {flow.request.host}")
            except json.JSONDecodeError:
                pass

        # Log cursor-based pagination for later reuse
        if "X-Next-Cursor" in flow.response.headers:
            print(f"{flow.request.path} cursor: {flow.response.headers['X-Next-Cursor']}")

addons = [TokenExtractor()]

Related terms

Concept map

How mitmproxy connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Web Scraping APIs
Building map…

Frequently asked questions

Why use mitmproxy instead of Wireshark?

Wireshark sniffs raw network traffic — for HTTPS, you see encrypted bytes. mitmproxy terminates TLS using its own CA, so you see plaintext request and response bodies. Wireshark is for low-level network debugging; mitmproxy is for HTTPS application traffic, which is what scrapers care about.

Can mitmproxy intercept HTTP/3 / QUIC?

Not yet at production quality. There's an experimental HTTP/3 mode but it lags upstream. For QUIC-only services (some Google properties) you currently force the client to fall back to HTTP/2 via an upstream rule and proxy that.

Is mitmproxy detectable by the server?

Mostly no — the server sees a normal Chrome or mobile-app TLS handshake from your machine, because mitmproxy is on your local network. The server only knows it's being intercepted if your client adds tell-tale headers (mitmproxy doesn't) or if the app reports it (some apps phone home with proxy-status flags — disable telemetry in such cases).

Last updated: 2026-05-27