How the leak happens
When a scraper requests https://example.com through a proxy, two things need to travel through the tunnel: the TCP/TLS connection (the encrypted link behind https) and the DNS lookup that turns example.com into an IP. With an HTTP proxy or a socks5h:// proxy, the client hands the hostname to the proxy and lets the proxy resolve it - so nothing leaks. With a plain socks5:// proxy, many clients resolve the hostname locally first, then ask the proxy to connect to the resulting IP. That local lookup goes to your machine's own DNS resolver, over your real connection.
The same problem shows up with system-level proxying that doesn't cover UDP, with split-tunnel VPN configs (where some traffic skips the tunnel), and with libraries whose DNS path is separate from their HTTP path. The HTTP request looks perfectly proxied while the DNS query quietly slips out the real network interface.
Why a DNS leak deanonymizes a scraper
There are two distinct harms. First, disclosure: whoever runs your DNS resolver (your ISP, a public resolver, or a corporate network) now has a log of every hostname you scraped, even though the page content itself went through the proxy. If you were relying on the proxy to keep those apart, that defeats the purpose.
Second, and more relevant to anti-bot detection, is geo incoherence: the resolver that did the lookup has its own geographic location. If your proxy exit is in Brazil but your DNS resolver is a German ISP, anyone correlating where the lookup came from with the connection can spot the mismatch. This stacks on top of the timezone/IP mismatch family of signals: the story your network tells stops being internally consistent.
Closing the leak
The fixes, in order of preference:
- Use
socks5h://notsocks5://- thehforces the hostname to be resolved at the proxy. This one-character change fixes the most common leak in curl, Python requests/httpx, and most scraping stacks. - Use an HTTP/HTTPS proxy - HTTP proxies always receive the hostname (in the CONNECT line), so resolution happens proxy-side by design.
- Run a controlled local resolver bound to the tunnel, so even client-side lookups go through the proxy. Some anti-detect browsers ship a built-in resolver for exactly this.
- Contain UDP - QUIC/HTTP3 and WebRTC can do their own out-of-band lookups; disable them or tunnel UDP (SOCKS5 UDP ASSOCIATE) so nothing escapes (see WebRTC leaks).
Verify with a DNS-leak test that reports which resolver answered. If the resolver's country matches your proxy exit, the tunnel is clean; if it matches your real ISP, you are leaking.
