Why the voice list is an OS fingerprint
Text-to-speech voices come from the platform, not the browser, so the set is tightly bound to the OS. Windows 11 exposes voices like Microsoft David, Microsoft Zira, and language-specific additions; macOS exposes Apple voices such as Samantha and Daniel with com.apple. voiceURIs; Android exposes Google voices. Each voice carries a name, lang, voiceURI, a localService boolean, and a default flag. The full list - which voices, in what order, with which languages - is a recognisable per-OS, often per-version signature.
This makes it a useful corroborating signal for platform claims and a relatively high-entropy one, because the exact voice set varies with OS version and installed language packs.
The empty list and the coherence trap
Headless browsers on Linux servers usually have no speech engine installed, so getVoices() returns []. An empty voice list is unusual for a real desktop user and, more damning, it contradicts a Windows or macOS User-Agent that should ship a known voice set. So the probe does double duty: empty list suggests headless, and a mismatched list (macOS voices under a Windows UA, or vice versa) suggests platform spoofing.
There is a well-known timing quirk: in Chrome the first synchronous getVoices() call frequently returns [] until the engine populates asynchronously and fires the voiceschanged event. Detection scripts account for this by waiting for the event, and a spoofing tool that returns a static list without the realistic asynchronous population can itself look off. The robust fix, as with fonts and audio, is to expose a coherent real voice list for the claimed OS rather than a generic or empty one.
