Accuracy scores for text-to-speech voices - MS in 4th - but not for long
How does one do accuracy testing of different text to speech voices? That’s a real interesting question. A voice can sound really great for a particular string and then come out as junk given ‘tricky’ text. There are all sorts of text normalization issues (e.g., should the string “read” be pronounced as R EH D or R EE D?) which I think I’ll save for a later discussion. But having just posted a link to waveforms of all of the TTS voices, it’s interesting to see that ASRNews independently rated all of the contenders, assigning its own accuracy scores to voices based on a range of criteria. (Of course, the detailed results are for purchase.) Interestingly, Microsoft’s desktop voice (MS Sam) was rated 4th behind Scansoft, Loquendo, and IBM. We'll be closing in on the lead with the Longhorn TTS engine for sure.
Comments
- Anonymous
June 07, 2005
To me, speech-to-text is much, much more interesting.
Do you know ratings of speech-to-text engines? I know that Dragon/ScanSoft engine was pretty good, as was IBM ViaVoice, but I am not familiary with the accuracy of MS, nor with any others in the past couple of years. - Anonymous
June 09, 2005
The comment has been removed