How the leak happens
When a scraper requests https://example.com through a proxy, two things must travel through the tunnel: the TCP/TLS connection and the DNS lookup that turns example.com into an IP. With an HTTP proxy or a socks5h:// proxy, the client hands the hostname to the proxy and the proxy resolves it - nothing leaks. With a plain socks5:// proxy, many clients resolve the hostname locally first and then ask the proxy to connect to the resulting IP. That local lookup goes to your machine's configured DNS resolver, over your real connection.
The same problem appears with system-level proxying that does not cover UDP, with split-tunnel VPN configs, and with libraries that have a separate DNS path from their HTTP path. The HTTP request looks perfectly proxied while the DNS query quietly exits the real interface.
Why a DNS leak deanonymizes a scraper
Two distinct harms. First, disclosure: whoever runs your DNS resolver (your ISP, a public resolver, or a corporate network) now has a log of every hostname you scraped, even though the page content went through the proxy. For anyone relying on the proxy for separation, that defeats the purpose.
Second, and more relevant to anti-bot detection, geo incoherence: the resolver that performed the lookup has its own geolocation. If your proxy exit is in Brazil but your DNS resolver is a German ISP, an observer correlating the authoritative DNS query location with the connection can see the mismatch. This compounds the timezone/IP mismatch family of signals: the story your network tells stops being internally consistent.
Closing the leak
The fixes, in order of preference:
- Use
socks5h://notsocks5://- thehforces hostname resolution at the proxy. This one-character change fixes the most common leak in curl, Python requests/httpx, and most scraping stacks. - Use an HTTP/HTTPS proxy - HTTP proxies always receive the hostname (in the CONNECT line), so resolution happens proxy-side by design.
- Run a controlled local resolver bound to the tunnel, so even client-side lookups go through the proxy. Some anti-detect browsers ship a built-in resolver for exactly this.
- Contain UDP - QUIC/HTTP3 and WebRTC can perform their own out-of-band lookups; disable them or tunnel UDP (SOCKS5 UDP ASSOCIATE) so nothing escapes (see WebRTC leaks).
Verify with a DNS-leak test that reports which resolver answered. If the resolver country matches your proxy exit, the tunnel is clean; if it matches your real ISP, you are leaking.
