Freigeben über


A short note about simulating unsupported engines

I have received a number of replies concerning my posts on how to simulate unsupported languages using phonemes.  I must apologize that the intention of the post was to show how to use a hack to get "some" support in cases where you need to recognize something in a language for which you do not have an engine.  For the vast majority of enterprise deployments, this hack will not be sufficient to get an application working in that language.  There are several reasons for this.

1) You will have no TTS support.  Of course you can get around this partially by combining recorded prompts in the prompt engine.  For some apps though, this is simply not feasible.  Even the prompt engine will have its limitations - as it uses the beginning and end phonemes of prompts to determine how to combine them. 

2) Recognition will not be as accurate as it would be with an engine calibrated for that language.  I do not know the numbers but it is certainly significant.

3) Phonemes vary between languages.  For instance, many of the phonemes present in Mandarin are not present in English and vice versa.  Therefore the best that you can do is to approximate these phonemes.  In some languages these approximations can be decent but in others it could lead to misrecognition.  For instance, Thai has aspirated and unaspirated versions of many consonants and a misrecognition of an aspirated for an unaspirated consonant will lead to a misrecognition.  Since English contains only aspirated versions of most of these consonants, the chance of misrecogition is high.

So where is approximation by phonemes useful?  Probably the best use is for regional dialects.  In this case the issues above are minimized. For instance

1) The TTS provided by the engine can be used because it will be generally understood by the population.  For instance, American Standard English is understood well in Louisiana.

2) The phonemes are very close - this means if you can spell a word in phonemes there is a much higher chance that the engine will understand you.

Of course, as the language varies from the base language recognition success will fall.   For instance, it will not be too hard to use phonetics to understand individuals from rural Mississippi but you will have much less success recognizing people from Delhi.  The primary reason for this is many individuals from Delhi will pronounce English using Hindi phonemes making it more difficult for the recognizer.

Comments