Anti-Bot

What Is Speech Synthesis Voice Fingerprinting?

What Is Speech Synthesis Voice Fingerprinting? — conceptual illustration
On this page

Speech synthesis fingerprinting reads the list of text-to-speech voices exposed by window.speechSynthesis.getVoices(). The available voices are bundled by the operating system, so the list is a strong OS-and-version signal: Windows ships specific Microsoft voices, macOS/iOS ship Apple voices, and Android ships Google voices, each with characteristic names, languages, and voiceURI values. A headless Linux server typically returns an empty voice list, which both flags automation and contradicts any Windows or macOS User-Agent.

Quick facts

APIwindow.speechSynthesis.getVoices()
RevealsOS + version via bundled TTS voice names, langs, voiceURIs, localService flag
Headless tellEmpty array on a typical headless Linux server
Coherence trapVoice set must match the claimed OS (Windows voices on a Windows UA)
QuirkOften returns [] on first call until the voiceschanged event fires

Why the voice list is an OS fingerprint

Text-to-speech voices come from the platform, not the browser, so the set is tightly bound to the OS. Windows 11 exposes voices like Microsoft David, Microsoft Zira, and language-specific additions; macOS exposes Apple voices such as Samantha and Daniel with com.apple. voiceURIs; Android exposes Google voices. Each voice carries a name, lang, voiceURI, a localService boolean, and a default flag. The full list - which voices, in what order, with which languages - is a recognisable per-OS, often per-version signature.

This makes it a useful corroborating signal for platform claims and a relatively high-entropy one, because the exact voice set varies with OS version and installed language packs.

The empty list and the coherence trap

Headless browsers on Linux servers usually have no speech engine installed, so getVoices() returns []. An empty voice list is unusual for a real desktop user and, more damning, it contradicts a Windows or macOS User-Agent that should ship a known voice set. So the probe does double duty: empty list suggests headless, and a mismatched list (macOS voices under a Windows UA, or vice versa) suggests platform spoofing.

There is a well-known timing quirk: in Chrome the first synchronous getVoices() call frequently returns [] until the engine populates asynchronously and fires the voiceschanged event. Detection scripts account for this by waiting for the event, and a spoofing tool that returns a static list without the realistic asynchronous population can itself look off. The robust fix, as with fonts and audio, is to expose a coherent real voice list for the claimed OS rather than a generic or empty one.

Code example

javascript
// Voice-list fingerprint (handle the async population quirk)
function getVoiceFingerprint() {
  return new Promise(resolve => {
    let v = speechSynthesis.getVoices();
    if (v.length) return resolve(summ(v));
    speechSynthesis.onvoiceschanged = () => resolve(summ(speechSynthesis.getVoices()));
  });
  function summ(voices) {
    return voices.map(x => x.name + '|' + x.lang + '|' + (x.localService ? 'L' : 'R'))
                 .sort().join(',');
  }
}
// [] on headless Linux. Windows UA with macOS 'com.apple.*' voices = spoof tell.

Related terms

What Is Font Fingerprinting?
Font fingerprinting identifies a device by discovering which fonts are installed and measuring how the system renders text. The script rende…
What Is Browser Fingerprinting?
Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…
What Is Headless Browser Detection?
Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…
What Is AudioContext Fingerprinting?
AudioContext fingerprinting plays a silent waveform through the Web Audio API, then reads back the resulting floating-point samples and hash…
What Is Fingerprint Clustering?
Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…
What Is Camoufox?
Camoufox is a stealth-focused fork of Firefox with anti-fingerprinting patches applied at the C++ build level. Unlike playwright-stealth, wh…
What Is Anti-Bot Detection?
Anti-bot detection is the set of techniques websites use to distinguish automated traffic from human users — and to block, challenge, or thr…
What Is Browser Fingerprinting Evasion?
Browser fingerprinting evasion is the practice of configuring an automated browser so that the combined fingerprint it presents — canvas, We…
Anti-Bot Vendor Detection Cheatsheet
The first step of any scrape against a protected site is identifying which anti-bot vendor is in front of it. The vendor determines almost e…
How Do Websites Detect Web Scrapers?
Websites detect scrapers by collecting hundreds of signals across the network, transport, browser, and behavioral layers, then scoring the c…

Concept map

How Speech Synthesis Fingerprinting connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections
You are here · Anti-Bot
Building map…

Frequently asked questions

Why does speechSynthesis.getVoices() reveal my operating system?

Because the voices are bundled by the OS, not the browser. Windows, macOS, Android, and Linux each ship a characteristic set of voice names, languages, and voiceURIs. The list is therefore a strong OS-and-version signal that must agree with the User-Agent - mismatched voices indicate platform spoofing.

Why did getVoices() return an empty array even on a real browser?

Chrome populates the voice list asynchronously, so the first synchronous call often returns [] until the voiceschanged event fires. Real detection scripts wait for that event. An empty result that never populates - common on headless Linux with no speech engine - is the actual bot tell.

Is speech synthesis a major detection vector?

It is a secondary, corroborating one rather than a primary block, but it is high-signal: empty lists catch headless servers and mismatched lists catch OS spoofing. It is most valuable as part of the broader coherence check across fonts, voices, and platform claims.

Last updated: 2026-05-30