Entropy as bits of identifying information
The idea comes from information theory and was popularised for the browser by the EFF’s Panopticlick study. The self-information of an observed attribute value is -log₂(p), where p is the fraction of the population that shares that value. A value half the population shares carries 1 bit; a value 1 in 1,000 share carries about 10 bits.
To single out one browser among a population of size N, you need log₂(N) bits. For ~16 million visitors that is about 24 bits; for the whole web, around 33. Panopticlick found the average browser already leaked roughly 18 bits — enough to make most browsers unique within a large site’s traffic. Crucially, the bits from independent attributes add up, so combining even a few mid-entropy signals crosses the uniqueness threshold fast.
Which signals carry the most bits
Not all attributes are equal. High-entropy signals carry many bits because they vary widely across devices:
- Canvas and WebGL render hashes — tied to GPU, driver, and OS.
- Installed font list — highly variable per machine.
- The exact User-Agent + screen resolution + plugin combination.
Low-entropy signals carry few bits because most people share them: timezone, primary language, OS family, colour depth. Individually weak, they still add to the total and, more importantly, are used in coherence checks — a low-entropy value that contradicts a high-entropy one is a tell.
This is why clustering works: real devices occupy a finite set of high-entropy combinations, and an entropy budget tells the vendor how confidently a given fingerprint pins to one device.
Why being too unique is a problem for bots
Naive evasion tries to randomise high-entropy signals — a fresh canvas hash, a shuffled font list, a random WebGL noise on every request. This maximises entropy, and that is exactly the mistake. Two anomalies follow:
1. Impossible uniqueness. A canvas hash never seen on any real device sits outside every cluster — high self-information, but in a region of the space no real hardware produces. That is the combination clustering rejects.
2. Zero spread across IPs, or too much. A real fingerprint appears on one device behind one IP. A scraper that reuses one fingerprint across thousands of IPs creates a fingerprint with abnormally low IP-entropy — an obvious farm. A scraper that randomises per request makes the population-level entropy spike unnaturally. Either way the distribution does not match real traffic.
The blend-in target
The winning move is not maximum entropy or minimum entropy — it is to land in a high-population, low-self-information bucket: present the same common configuration that millions of real users share (a popular Intel/Chrome/Windows profile, the default font set, a mainstream screen size) and keep it stable per session. Low self-information means the fingerprint blends into a crowd; stability means it behaves like one real device over time.
Achieving this by hand is hard because the attributes constrain each other — the same coherence problem behind lie detection and clustering. Tools that serve a coherent, common, real-hardware profile — such as Camoufox — aim for exactly this low-self-information, high-stability target rather than assembling random values at runtime. Browser vendors are pushing the same direction defensively: Chrome’s User-Agent reduction and privacy-budget work deliberately shrink the entropy a page can read.
