Anti-Bot

What Is Fingerprint Entropy?

By the Scrappey Research Team

What Is Fingerprint Entropy? — conceptual illustration
On this page

Fingerprint entropy is a way to measure how much a browser attribute gives away about who you are, counted in bits. Think of entropy as "how much this value narrows down the crowd." A signal that splits everyone into two equal halves is worth one bit; a signal that picks out a single browser from ~16 million carries about 24 bits. When you combine several attributes, their entropy roughly adds up — which is why just a handful of revealing signals (a canvas hash, your font list, your WebGL renderer) is enough to pin down one device. For scrapers the takeaway is counter-intuitive: a fingerprint that is too unique gives you away just as much as one that is internally inconsistent.

Quick facts

UnitBits of information: bits = -log₂(probability of the value)
Uniqueness thresholdlog₂(population) bits identifies one browser — ~24 bits for ~16M users
High-entropy signalsCanvas/WebGL hash, font list, plugin set, screen + UA combination
Low-entropy signalsTimezone, language, OS family — common, so little information each
Scraper trapBoth too-unique and seen-on-many-IPs fingerprints are suspicious

Entropy as bits of identifying information

The concept comes from information theory and was popularised for browsers by the EFF’s Panopticlick study. The key term is self-information: how surprising a particular value is. Its formula is -log₂(p), where p is the share of people who have that same value. The rarer the value, the higher the number. A value that half the population shares is worth 1 bit; a value only 1 in 1,000 people have is worth about 10 bits.

To single out one browser from a group of size N, you need log₂(N) bits. For ~16 million visitors that works out to about 24 bits; for the entire web, around 33. Panopticlick found the average browser already leaked roughly 18 bits — enough to make most browsers unique inside a large site’s traffic. And because the bits from independent attributes add together, combining even a few medium-strength signals crosses the uniqueness line quickly.

Which signals carry the most bits

Not all attributes reveal the same amount. High-entropy signals carry many bits because they differ a lot from one device to the next:

  • Canvas and WebGL render hashes — these depend on your GPU, graphics driver, and OS, so tiny differences produce different results.
  • Your installed font list — varies a lot from machine to machine.
  • The exact combination of User-Agent, screen resolution, and plugins.

Low-entropy signals carry few bits because almost everyone shares them: timezone, primary language, OS family, colour depth. On their own they reveal little, but they still add to the running total — and, more importantly, they are used in consistency checks. A low-entropy value that contradicts a high-entropy one (say, a Windows font list paired with a macOS timezone) is itself a giveaway.

This is why clustering works: real devices only ever land on a limited set of high-entropy combinations, and an entropy budget tells the anti-bot vendor how confidently a given fingerprint maps to a single device.

Why being too unique is a problem

A common but naive approach is to randomise the high-entropy signals — a fresh canvas hash, a shuffled font list, random WebGL noise on every request. This maximises entropy, and that is exactly the wrong move. Two problems follow:

1. Impossible uniqueness. A canvas hash that has never appeared on any real device sits outside every known cluster. It carries very high self-information, but it lives in a region of the space that no real hardware can produce — and that is precisely the kind of combination clustering rejects.

2. Wrong spread across IP addresses. A real fingerprint normally shows up on one device behind one IP. A scraper that reuses a single fingerprint across thousands of IPs produces abnormally low IP-entropy — a clear sign of a bot farm. A scraper that randomises on every request makes the population-wide entropy spike in a way real traffic never does. Either way, the distribution does not match real users.

The blend-in target

The winning move is neither maximum nor minimum entropy — it is to land in a high-population, low-self-information bucket. In plain terms: present a common configuration that millions of real users already share (a popular Intel/Chrome/Windows profile, the default font set, a mainstream screen size) and keep it stable for the whole session. Low self-information means your fingerprint blends into the crowd; stability means it behaves like one consistent real device over time.

Doing this by hand is hard, because the attributes have to agree with each other — the same consistency problem behind lie detection and clustering. Tools that serve a coherent, common, real-hardware profile — such as Camoufox — aim for exactly this low-self-information, high-stability target instead of stitching together random values at runtime. Browser vendors are moving the same way for privacy reasons: Chrome’s User-Agent reduction and privacy-budget work deliberately shrink the entropy a page is allowed to read.

Code example

python
import math

# Self-information of an attribute value: rarer value -> more bits.
def bits(p):                       # p = fraction of population with this value
    return -math.log2(p)

# A few illustrative per-attribute shares and their information content:
attrs = {
    "timezone = Europe/Amsterdam": 0.02,    # ~5.6 bits  (common-ish)
    "language = en-US":            0.30,    # ~1.7 bits  (very common)
    "font list (this machine)":    0.001,   # ~10  bits  (high entropy)
    "canvas hash (this machine)":  0.0005,  # ~11  bits  (high entropy)
}

total = sum(bits(p) for p in attrs.values())
for name, p in attrs.items():
    print(f"{name:32} {bits(p):5.1f} bits")
print(f"{'COMBINED':32} {total:5.1f} bits")

# ~24 bits uniquely identifies one browser among ~16 million.
# Independent signals ADD, so a few high-entropy values cross that line.
# Randomising them doesn't hide you -- it pushes you OUTSIDE every real
# cluster, which is the anomaly clustering is built to catch.

Related terms

Concept map

How Fingerprint Entropy connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

How many bits does it take to uniquely identify a browser?

Roughly log₂ of the population size. For a site with ~16 million distinct visitors, that is about 24 bits; across the whole web it is around 33. The EFF’s Panopticlick study found the average browser already leaks about 18 bits — enough to be unique within most large sites. And because independent signals add together, a few high-entropy values (canvas, fonts, WebGL) push you well past the threshold.

If high entropy identifies me, should I minimise my fingerprint entropy?

You should minimise your *self-information* — present common values that millions of others also have — but not by stripping out or randomising signals. A missing or randomised canvas is high-entropy in the wrong way: it lands outside every real-device cluster and gets flagged. The goal is a common, coherent, stable profile that blends into the crowd, not a unique or empty one.

How does entropy relate to fingerprint clustering?

They are two views of the same statistics. Entropy measures how much a single attribute narrows down the population; clustering measures whether the *combination* of attributes falls inside the region real devices actually occupy. A fingerprint can look ordinary on every individual field yet still sit outside every cluster if those fields are combined in a way no real hardware produces.

Last updated: 2026-05-31