Proxies

What Is a DNS Leak in Web Scraping?

What Is a DNS Leak in <a href=
On this page

A DNS leak is when your computer looks up website names through its own DNS resolver instead of through the proxy, which exposes the real network hiding behind that proxy. DNS (Domain Name System) is the phonebook that turns a name like example.com into an IP address. Even if your proxy is correctly carrying your HTTP traffic, if that name lookup goes out over your real connection instead, your ISP and the DNS server can see exactly which sites you visit - and the resolver's location can give away your true region. The classic cause is using socks5:// (the client does the lookup itself) when you meant socks5h:// (the lookup happens inside the tunnel).

Quick facts

What leaksThe hostname lookup, sent to your real DNS resolver outside the proxy
socks5 vs socks5hsocks5 resolves DNS client-side (leaks); socks5h resolves at the proxy (safe)
Why it mattersReveals target hosts and a resolver geolocation that can mismatch the exit IP
Also affectsWebRTC and QUIC paths can do their own lookups outside the proxy
FixForce tunnel-side resolution, or run a controlled resolver bound to the proxy

How the leak happens

When a scraper requests https://example.com through a proxy, two things need to travel through the tunnel: the TCP/TLS connection (the encrypted link behind https) and the DNS lookup that turns example.com into an IP. With an HTTP proxy or a socks5h:// proxy, the client hands the hostname to the proxy and lets the proxy resolve it - so nothing leaks. With a plain socks5:// proxy, many clients resolve the hostname locally first, then ask the proxy to connect to the resulting IP. That local lookup goes to your machine's own DNS resolver, over your real connection.

The same problem shows up with system-level proxying that doesn't cover UDP, with split-tunnel VPN configs (where some traffic skips the tunnel), and with libraries whose DNS path is separate from their HTTP path. The HTTP request looks perfectly proxied while the DNS query quietly slips out the real network interface.

Why a DNS leak deanonymizes a scraper

There are two distinct harms. First, disclosure: whoever runs your DNS resolver (your ISP, a public resolver, or a corporate network) now has a log of every hostname you scraped, even though the page content itself went through the proxy. If you were relying on the proxy to keep those apart, that defeats the purpose.

Second, and more relevant to anti-bot detection, is geo incoherence: the resolver that did the lookup has its own geographic location. If your proxy exit is in Brazil but your DNS resolver is a German ISP, anyone correlating where the lookup came from with the connection can spot the mismatch. This stacks on top of the timezone/IP mismatch family of signals: the story your network tells stops being internally consistent.

Closing the leak

The fixes, in order of preference:

  1. Use socks5h:// not socks5:// - the h forces the hostname to be resolved at the proxy. This one-character change fixes the most common leak in curl, Python requests/httpx, and most scraping stacks.
  2. Use an HTTP/HTTPS proxy - HTTP proxies always receive the hostname (in the CONNECT line), so resolution happens proxy-side by design.
  3. Run a controlled local resolver bound to the tunnel, so even client-side lookups go through the proxy. Some anti-detect browsers ship a built-in resolver for exactly this.
  4. Contain UDP - QUIC/HTTP3 and WebRTC can do their own out-of-band lookups; disable them or tunnel UDP (SOCKS5 UDP ASSOCIATE) so nothing escapes (see WebRTC leaks).

Verify with a DNS-leak test that reports which resolver answered. If the resolver's country matches your proxy exit, the tunnel is clean; if it matches your real ISP, you are leaking.

Code example

bash
# The one-character fix: socks5h instead of socks5

# LEAKS - curl resolves example.com locally, then proxies the IP
curl --proxy socks5://user:pass@proxy:1080 https://example.com

# SAFE - the 'h' forces resolution inside the proxy tunnel
curl --proxy socks5h://user:pass@proxy:1080 https://example.com

# Python requests / httpx - same rule
#   proxies={'https': 'socks5h://user:pass@proxy:1080'}   # safe
#   proxies={'https': 'socks5://user:pass@proxy:1080'}    # leaks DNS

# Verify which resolver answered (country should match the proxy exit):
curl --proxy socks5h://user:pass@proxy:1080 https://dnsleaktest.example/api

Related terms

Concept map

How DNS Leak connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Proxies
Building map…

Frequently asked questions

What is the difference between socks5 and socks5h?

With socks5 the client resolves the hostname to an IP locally and asks the proxy to connect to that IP - so the DNS lookup leaks to your real resolver. With socks5h the client sends the hostname to the proxy and the proxy resolves it, so nothing leaks. For scraping behind a proxy, almost always use socks5h.

Does an HTTP proxy leak DNS like socks5 does?

No. HTTP and HTTPS proxies receive the target hostname directly (in the request line or the CONNECT request), so the proxy does the resolution by design. The leak is specific to SOCKS5 clients that resolve locally, plus side channels like WebRTC and QUIC.

Can a DNS leak get my scraper blocked, or just expose me?

Both. The immediate harm is disclosure - your resolver sees the hostnames. For anti-bot detection, the leak can create a geolocation mismatch between the DNS resolver and the proxy exit IP, which adds to the timezone/IP coherence signals that anti-bot systems already score.

Last updated: 2026-05-31