Entropy as bits of identifying information
The concept comes from information theory and was popularised for browsers by the EFF’s Panopticlick study. The key term is self-information: how surprising a particular value is. Its formula is -log₂(p), where p is the share of people who have that same value. The rarer the value, the higher the number. A value that half the population shares is worth 1 bit; a value only 1 in 1,000 people have is worth about 10 bits.
To single out one browser from a group of size N, you need log₂(N) bits. For ~16 million visitors that works out to about 24 bits; for the entire web, around 33. Panopticlick found the average browser already leaked roughly 18 bits — enough to make most browsers unique inside a large site’s traffic. And because the bits from independent attributes add together, combining even a few medium-strength signals crosses the uniqueness line quickly.
Which signals carry the most bits
Not all attributes reveal the same amount. High-entropy signals carry many bits because they differ a lot from one device to the next:
- Canvas and WebGL render hashes — these depend on your GPU, graphics driver, and OS, so tiny differences produce different results.
- Your installed font list — varies a lot from machine to machine.
- The exact combination of User-Agent, screen resolution, and plugins.
Low-entropy signals carry few bits because almost everyone shares them: timezone, primary language, OS family, colour depth. On their own they reveal little, but they still add to the running total — and, more importantly, they are used in consistency checks. A low-entropy value that contradicts a high-entropy one (say, a Windows font list paired with a macOS timezone) is itself a giveaway.
This is why clustering works: real devices only ever land on a limited set of high-entropy combinations, and an entropy budget tells the anti-bot vendor how confidently a given fingerprint maps to a single device.
Why being too unique is a problem
A common but naive approach is to randomise the high-entropy signals — a fresh canvas hash, a shuffled font list, random WebGL noise on every request. This maximises entropy, and that is exactly the wrong move. Two problems follow:
1. Impossible uniqueness. A canvas hash that has never appeared on any real device sits outside every known cluster. It carries very high self-information, but it lives in a region of the space that no real hardware can produce — and that is precisely the kind of combination clustering rejects.
2. Wrong spread across IP addresses. A real fingerprint normally shows up on one device behind one IP. A scraper that reuses a single fingerprint across thousands of IPs produces abnormally low IP-entropy — a clear sign of a bot farm. A scraper that randomises on every request makes the population-wide entropy spike in a way real traffic never does. Either way, the distribution does not match real users.
The blend-in target
The winning move is neither maximum nor minimum entropy — it is to land in a high-population, low-self-information bucket. In plain terms: present a common configuration that millions of real users already share (a popular Intel/Chrome/Windows profile, the default font set, a mainstream screen size) and keep it stable for the whole session. Low self-information means your fingerprint blends into the crowd; stability means it behaves like one consistent real device over time.
Doing this by hand is hard, because the attributes have to agree with each other — the same consistency problem behind lie detection and clustering. Tools that serve a coherent, common, real-hardware profile — such as Camoufox — aim for exactly this low-self-information, high-stability target instead of stitching together random values at runtime. Browser vendors are moving the same way for privacy reasons: Chrome’s User-Agent reduction and privacy-budget work deliberately shrink the entropy a page is allowed to read.
