Anti-Bot

What Is Fingerprint Entropy?

What Is Fingerprint Entropy? — conceptual illustration
On this page

Fingerprint entropy measures how much identifying information a browser attribute carries, expressed in bits. A signal that splits the population in half is worth one bit; one that uniquely identifies a browser among ~16 million carries about 24 bits. The entropy of a combined fingerprint is roughly the sum of its parts, which is why a handful of high-entropy signals — canvas hash, font list, WebGL renderer — is enough to single out one device. For scrapers the lesson is counter-intuitive: a fingerprint that is too unique is as much of a tell as one that is inconsistent.

Quick facts

UnitBits of information: bits = -log₂(probability of the value)
Uniqueness thresholdlog₂(population) bits identifies one browser — ~24 bits for ~16M users
High-entropy signalsCanvas/WebGL hash, font list, plugin set, screen + UA combination
Low-entropy signalsTimezone, language, OS family — common, so little information each
Scraper trapBoth too-unique and seen-on-many-IPs fingerprints are suspicious

Entropy as bits of identifying information

The idea comes from information theory and was popularised for the browser by the EFF’s Panopticlick study. The self-information of an observed attribute value is -log₂(p), where p is the fraction of the population that shares that value. A value half the population shares carries 1 bit; a value 1 in 1,000 share carries about 10 bits.

To single out one browser among a population of size N, you need log₂(N) bits. For ~16 million visitors that is about 24 bits; for the whole web, around 33. Panopticlick found the average browser already leaked roughly 18 bits — enough to make most browsers unique within a large site’s traffic. Crucially, the bits from independent attributes add up, so combining even a few mid-entropy signals crosses the uniqueness threshold fast.

Which signals carry the most bits

Not all attributes are equal. High-entropy signals carry many bits because they vary widely across devices:

  • Canvas and WebGL render hashes — tied to GPU, driver, and OS.
  • Installed font list — highly variable per machine.
  • The exact User-Agent + screen resolution + plugin combination.

Low-entropy signals carry few bits because most people share them: timezone, primary language, OS family, colour depth. Individually weak, they still add to the total and, more importantly, are used in coherence checks — a low-entropy value that contradicts a high-entropy one is a tell.

This is why clustering works: real devices occupy a finite set of high-entropy combinations, and an entropy budget tells the vendor how confidently a given fingerprint pins to one device.

Why being too unique is a problem for bots

Naive evasion tries to randomise high-entropy signals — a fresh canvas hash, a shuffled font list, a random WebGL noise on every request. This maximises entropy, and that is exactly the mistake. Two anomalies follow:

1. Impossible uniqueness. A canvas hash never seen on any real device sits outside every cluster — high self-information, but in a region of the space no real hardware produces. That is the combination clustering rejects.

2. Zero spread across IPs, or too much. A real fingerprint appears on one device behind one IP. A scraper that reuses one fingerprint across thousands of IPs creates a fingerprint with abnormally low IP-entropy — an obvious farm. A scraper that randomises per request makes the population-level entropy spike unnaturally. Either way the distribution does not match real traffic.

The blend-in target

The winning move is not maximum entropy or minimum entropy — it is to land in a high-population, low-self-information bucket: present the same common configuration that millions of real users share (a popular Intel/Chrome/Windows profile, the default font set, a mainstream screen size) and keep it stable per session. Low self-information means the fingerprint blends into a crowd; stability means it behaves like one real device over time.

Achieving this by hand is hard because the attributes constrain each other — the same coherence problem behind lie detection and clustering. Tools that serve a coherent, common, real-hardware profile — such as Camoufox — aim for exactly this low-self-information, high-stability target rather than assembling random values at runtime. Browser vendors are pushing the same direction defensively: Chrome’s User-Agent reduction and privacy-budget work deliberately shrink the entropy a page can read.

Code example

python
import math

# Self-information of an attribute value: rarer value -> more bits.
def bits(p):                       # p = fraction of population with this value
    return -math.log2(p)

# A few illustrative per-attribute shares and their information content:
attrs = {
    "timezone = Europe/Amsterdam": 0.02,    # ~5.6 bits  (common-ish)
    "language = en-US":            0.30,    # ~1.7 bits  (very common)
    "font list (this machine)":    0.001,   # ~10  bits  (high entropy)
    "canvas hash (this machine)":  0.0005,  # ~11  bits  (high entropy)
}

total = sum(bits(p) for p in attrs.values())
for name, p in attrs.items():
    print(f"{name:32} {bits(p):5.1f} bits")
print(f"{'COMBINED':32} {total:5.1f} bits")

# ~24 bits uniquely identifies one browser among ~16 million.
# Independent signals ADD, so a few high-entropy values cross that line.
# Randomising them doesn't hide you -- it pushes you OUTSIDE every real
# cluster, which is the anomaly clustering is built to catch.

Related terms

Concept map

How Fingerprint Entropy connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

How many bits does it take to uniquely identify a browser?

About log₂ of the population size. For a site with ~16 million distinct visitors that is roughly 24 bits; across the whole web it is around 33. The EFF’s Panopticlick study found the average browser already leaks about 18 bits, which is enough to be unique within most large sites — and because independent signals add together, a few high-entropy values (canvas, fonts, WebGL) push well past the threshold.

If high entropy identifies me, should I minimise my fingerprint entropy?

You should minimise your *self-information* — present common values millions of others share — but not by stripping or randomising signals. A missing or randomised canvas is high-entropy in the wrong way: it lands outside every real-device cluster and gets flagged. The target is a common, coherent, stable profile that blends into a crowd, not a unique or empty one.

How does entropy relate to fingerprint clustering?

They are two views of the same statistics. Entropy measures how much a single attribute narrows down the population; clustering measures whether the *combination* of attributes falls inside the region real devices actually occupy. A fingerprint can be low-entropy on each field yet still sit outside every cluster if the fields are combined in a way no real hardware produces.

Last updated: 2026-05-28