Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Custom Pronunciations Support

Your application may use specialized vocabulary, such as proper names, fictitious names or words, or other custom terminology that the speech recognition engine does not interpret as well as expected. Typically, Microsoft speech recognition engines are quite adept at creating pronunciations for words that are uncommon in a language. However, you may find that creating custom pronunciations improves the accuracy of speech recognition for words in your application that have unusual spellings or that do not follow the typical pronunciation rules for the orthography of a language. It is important to test custom pronunciations that you create to verify that they provide an enhanced speech recognition experience for the intended audience of your application.

The Microsoft Grammar Development Tools provide support for custom pronunciations included inline in speech recognition grammars, and custom pronunciations listed in lexicons that are referenced by grammars.

The following table summarizes the support for custom pronunciations provided by each of the Grammar Development Tools.

Tool

Custom Pronunciation Context

Tool Behavior

GrammarValidator

Custom pronunciations inline in a grammar or in a linked lexicon.

Verifies syntax of inline pronunciations both in terms of schema and content.

PhraseGenerator

Custom pronunciations inline in a grammar or in a linked lexicon.

Does not include pronunciations in the output, by design.

Confusabilty

Custom pronunciations inline in a grammar, or in a linked lexicon, or in a list of phrases used as input to the /PhraseFile option.

Does NOT use the custom pronunciation (if specified) to create a pronunciation for a word.

DOES use custom pronunciations (if specified) when searching for confusable phrases for pronunciations it created.

Does not use custom pronunciations in a list of phrases used as input to the /PhraseFile option.

CheckPhrase

Custom pronunciations inline in a grammar or in a linked lexicon.

Returns the set of custom pronunciations associated with an emulated phrase, if matched.

Simulator

Custom pronunciations inline in a grammar or in a linked lexicon passed in with the utterances.

Uses custom pronunciations for recognition.

Simulator Results Analyzer

Only relevant if there is a pass through of the custom pronunciations.

Outputs custom pronunciations in the RecoResultPronunciation element.

GrammarCompiler

Custom pronunciations inline in a grammar or in a linked lexicon.

Compiles grammars that contain custom pronunciations.

PrepareGrammar

Custom pronunciations inline in a grammar or in a linked lexicon.

Compiles grammars that contain custom pronunciations.

When deciding which pronunciation to use for a word or phrase during speech recognition, a speech recognition engine looks for pronunciations at the following locations in order:

  • Inline in grammar documents

  • In lexicon files linked from a grammar document

  • In the speech recognition engine's internal lexicon

If there are custom pronunciations specified for the same word both inline in a grammar and in a linked lexicon, the speech recognition engine uses only the inline pronunciations. Similarly, if there are custom pronunciations specified in a lexicon linked from a grammar, the speech recognition engine uses those pronunciations instead of, not in addition to, the pronunciations given in the engine's internal lexicon.

The Simulator tool detects custom pronunciations inline in grammars and in lexicons linked from grammars. The table below describes how Simulator selects custom pronunciations in a range of scenarios.

Scenario

Behavior of Simulator

Phrase X is in the grammar and has NO custom pronunciations OR Phrase X is not in the grammar.

If no custom pronunciations are found, uses the pronunciation from the engine's lexicon.

Phase X is in the grammar and has ONE custom pronunciation specified inline in the grammar OR ONE custom pronunciation specified in a linked lexicon.

Uses the custom pronunciation instead of the pronunciation from the engine's lexicon.

Phase X is in the grammar and has ONE custom pronunciation specified inline in the grammar AND ONE custom pronunciation specified in a linked lexicon.

Uses the inline custom pronunciation instead of the pronunciation in the linked lexicon or any pronunciations from the engine's lexicon.

Phrase X is the grammar and has MULTIPLE custom pronunciations inline in the grammar OR MULTIPLE custom pronunciations in a linked lexicon.

Uses the custom pronunciations for Simulation instead of any pronunciations from the engine's lexicon.

Phrase X is the grammar and has MULTIPLE custom pronunciations inline in the grammar AND MULTIPLE pronunciations in a linked lexicon.

Uses the inline custom pronunciations instead of the pronunciations in the linked lexicon or any pronunciations from the engine's lexicon.

When deciding whether to implement custom pronunciations inline in a grammar or in a linked lexicon, consider the following:

  • Custom pronunciations specified inline in grammars apply only to the single occurrence of a word in the grammar.

  • Custom pronunciations specified in lexicons apply to all occurrences of a word in a grammar.

  • A lexicon linked from a grammar is only active while the grammar is active for recognition.

For information about how to incorporate custom pronunciations in your application, see Using Custom Pronunciations.