What Is Speech Synthesis Voice Fingerprinting?

By the Scrappey Research Team

Paste into ChatGPT, Claude, or any LLM

What Is Speech Synthesis Voice Fingerprinting? — conceptual illustration

On this page

Speech synthesis fingerprinting reads the list of text-to-speech voices exposed by window.speechSynthesis.getVoices(). "Text-to-speech" means the built-in feature that reads text aloud, and the voices it offers are installed by your operating system, not your browser. That makes the list a strong clue about which OS and version you are running: Windows ships specific Microsoft voices, macOS/iOS ship Apple voices, and Android ships Google voices, each with characteristic names, languages, and voiceURI values (a unique ID string per voice). A headless Linux server - a browser running on a server with no screen or speech engine - typically returns an empty voice list, which both flags automation and contradicts any Windows or macOS User-Agent (the text a browser uses to say what it is).

API	window.speechSynthesis.getVoices()
Reveals	OS + version via bundled TTS voice names, langs, voiceURIs, localService flag
Headless tell	Empty array on a typical headless Linux server
Coherence trap	Voice set must match the claimed OS (Windows voices on a Windows UA)
Quirk	Often returns [] on first call until the voiceschanged event fires

Why the voice list is an OS fingerprint

Text-to-speech voices come from the platform, not the browser, so the set is tightly tied to the operating system. Windows 11 exposes voices like Microsoft David, Microsoft Zira, and language-specific additions; macOS exposes Apple voices such as Samantha and Daniel with com.apple. voiceURIs; Android exposes Google voices. Each voice carries a name, a lang (its language), a voiceURI (its ID), a localService boolean (true if the voice runs on the device rather than in the cloud), and a default flag. The full list - which voices, in what order, with which languages - forms a recognisable per-OS, often per-version signature.

That makes it a useful way to double-check what platform a browser claims to be, and a relatively high-entropy signal - meaning it narrows down who you are quite a lot - because the exact voice set changes with OS version and installed language packs.

The empty list and the coherence trap

Headless browsers on Linux servers usually have no speech engine installed, so getVoices() returns [] (an empty list). An empty voice list is unusual for a real desktop user and, worse, it contradicts a Windows or macOS User-Agent, which should come with a known set of voices. So this one check does two jobs: an empty list hints at a headless bot, and a mismatched list (macOS voices under a Windows User-Agent, or the reverse) hints at platform spoofing - lying about which OS you run.

There is a well-known timing quirk: in Chrome the first call to getVoices() often returns [] because the engine loads the voices in the background and only signals it is ready by firing a voiceschanged event. Detection scripts know this and wait for that event, so a spoofing tool that just hands back a fixed list - without the realistic background loading - can itself look suspicious. The reliable fix, as with fonts and audio fingerprinting, is to expose a coherent, real voice list for the OS you are claiming, rather than a generic or empty one.

Code example

javascript

// Voice-list fingerprint (handle the async population quirk)
function getVoiceFingerprint() {
  return new Promise(resolve => {
    let v = speechSynthesis.getVoices();
    if (v.length) return resolve(summ(v));
    speechSynthesis.onvoiceschanged = () => resolve(summ(speechSynthesis.getVoices()));
  });
  function summ(voices) {
    return voices.map(x => x.name + '|' + x.lang + '|' + (x.localService ? 'L' : 'R'))
                 .sort().join(',');
  }
}
// [] on headless Linux. Windows UA with macOS 'com.apple.*' voices = spoof tell.

Related terms

What Is Font Fingerprinting?

Font fingerprinting identifies a device by working out which fonts are installed on it and measuring how that device draws text. The idea is…

What Is Browser Fingerprinting?

Browser fingerprinting is a technique that identifies and tracks a visitor by combining dozens of small, observable characteristics of their…

What Is Headless Browser Detection?

Headless browser detection is the set of probes anti-bot systems use to distinguish a headless or instrumented Chrome session from a real us…

What Is AudioContext Fingerprinting?

AudioContext fingerprinting plays a silent waveform through the Web Audio API, then reads back the resulting floating-point samples and hash…

What Is Fingerprint Clustering?

Fingerprint clustering is the practice of grouping fingerprints from millions of real visitors by similarity, then rejecting any new visitor…

What Is Camoufox?

Camoufox is a fork of Firefox with anti-fingerprinting patches applied at the C++ build level. That phrase matters: most anti-fingerprinting…

What Is Anti-Bot Detection?

Anti-bot detection is the set of techniques websites use to tell automated traffic apart from real human visitors — and then block, challeng…

How Browser Fingerprinting Works

Browser fingerprinting is how a site combines signals — canvas, WebGL, audio, fonts, navigator probes, TLS (the encryption layer behind http…

Anti-Bot Vendor Detection Cheatsheet

A useful first step when working with any protected site you are authorized to access is identifying which anti-bot vendor sits in front of …

How Do Websites Detect Web Scrapers?

Websites spot scrapers by gathering hundreds of small clues about each visitor, then scoring how human the whole picture looks. No single cl…

Concept map

How Speech Synthesis Fingerprinting connects

The terms most directly tied to this one. Hover a node to see its neighbours, click to preview, drag to rearrange.

0 terms · 0 connections

You are here · Anti-Bot

Tools & solutions for this topic

Frequently asked questions

Why does speechSynthesis.getVoices() reveal my operating system?

Because the voices are installed by the OS, not the browser. Windows, macOS, Android, and Linux each ship a characteristic set of voice names, languages, and voiceURIs (the per-voice ID strings). The list is therefore a strong clue about your OS and version, and it must agree with your User-Agent - voices that do not match the claimed OS point to platform spoofing.

Why did getVoices() return an empty array even on a real browser?

Chrome loads the voice list in the background, so the very first call often returns [] until the voiceschanged event fires to say the list is ready. Real detection scripts wait for that event. The actual bot giveaway is an empty result that never fills in - common on headless Linux with no speech engine installed.

Is speech synthesis a major detection vector?

It is a secondary, supporting check rather than a primary blocker, but it carries a lot of signal: empty lists catch headless servers, and mismatched lists catch OS spoofing. It is most useful as one part of a broader consistency check across fonts, voices, and the platform a browser claims to be.

Last updated: 2026-05-31