Why the voice list is an OS fingerprint
Text-to-speech voices come from the platform, not the browser, so the set is tightly tied to the operating system. Windows 11 exposes voices like Microsoft David, Microsoft Zira, and language-specific additions; macOS exposes Apple voices such as Samantha and Daniel with com.apple. voiceURIs; Android exposes Google voices. Each voice carries a name, a lang (its language), a voiceURI (its ID), a localService boolean (true if the voice runs on the device rather than in the cloud), and a default flag. The full list - which voices, in what order, with which languages - forms a recognisable per-OS, often per-version signature.
That makes it a useful way to double-check what platform a browser claims to be, and a relatively high-entropy signal - meaning it narrows down who you are quite a lot - because the exact voice set changes with OS version and installed language packs.
The empty list and the coherence trap
Headless browsers on Linux servers usually have no speech engine installed, so getVoices() returns [] (an empty list). An empty voice list is unusual for a real desktop user and, worse, it contradicts a Windows or macOS User-Agent, which should come with a known set of voices. So this one check does two jobs: an empty list hints at a headless bot, and a mismatched list (macOS voices under a Windows User-Agent, or the reverse) hints at platform spoofing - lying about which OS you run.
There is a well-known timing quirk: in Chrome the first call to getVoices() often returns [] because the engine loads the voices in the background and only signals it is ready by firing a voiceschanged event. Detection scripts know this and wait for that event, so a spoofing tool that just hands back a fixed list - without the realistic background loading - can itself look suspicious. The reliable fix, as with fonts and audio fingerprinting, is to expose a coherent, real voice list for the OS you are claiming, rather than a generic or empty one.
