Compartilhar via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

token Element (Microsoft.Speech)

Contains a string that a speech recognizer can use for recognition and optionally specifies the display form of the string and the precise pronunciation that will trigger recognition.

Syntax

<token
  sapi:display = "string"  
  sapi:pron = "string">
</token>

Attributes

Attribute

Description

sapi:display

Optional. Specifies the form of the word or phrase contained by the token element that should be displayed in the graphical user interface. The token element contains the lexical form of a word, which is used for recognition unless a custom pronunciation is specified by the sapi:pron attribute. The display form of a word is often the same as its lexical form.

When using sapi:display in a token element, the grammar Element (Microsoft.Speech) must include the following declaration: xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions"

sapi:pron

Optional. Specifies an inline, custom pronunciation that the speech recognition engine can use to recognize the contents of the token element. The value of sapi:pron must use phones from the phonetic alphabet specified in the sapi:alphabet attribute of the grammar element.

When using sapi:pron in a token element, the grammar Element (Microsoft.Speech) must include the sapi:alphabet attribute, and must also contain the following declaration: xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions"

Remarks

A token element typically contains a word or short phrase in the language being recognized. For example, although the city name San Francisco consists of two character strings separated by a space, English speakers recognize the name as a single entity. A token element must not be empty.

The token element allows you to specify three forms of a word: the display form, the lexical form, and a custom pronunciation for the word. Possible uses for the display form include cardinal numbers and acronyms. In the following example, the phrases "United States of America" and "fifty" would be used for recognition, but the graphical user interface would display "USA" and "50".

<item> The <token sapi:display="USA"> United States of America </token> has <token sapi:display="50"> fifty </token> states. </item>

Phones are letters or symbols that describe the sounds of speech. The Microsoft Speech Platform SDK 11 supports three phonetic alphabets for speech recognition: the Universal Phone Set (UPS), the Speech API (SAPI) Phone set, and the International Phonetic Alphabet (IPA). The phones specified in sapi:pron must match the phonetic alphabet specified in the sapi:alphabet attribute of the grammar element. If the phones are not space-delimited or the specified string contains an unrecognized phone, the recognition engine does not recognize the specified pronunciation as a valid pronunciation of the word contained by the token element. If sapi:pron is specified, the speech recognition engine does not use the string contained by the token element for recognition, but returns the string as the recognition result if the speech input matches the pronunciation specified in the sapi:pron attribute.

Pronunciations specified in token elements in speech recognition grammar documents take precedence over pronunciations specified in lexicons associated with a grammar or a recognition engine. Also, the pronunciation in a token element applies only to the single occurrence of the word or phrase contained by the token element.

Unlike the Speech Recognition Grammar Specification (SRGS) Version 1.0 specification, the Speech Platform SDK 11 does not support the use of the xml:lang attribute on the token element. Grammars in the Speech Platform SDK 11 can contain only one language, and this must be declared in the grammar Element (Microsoft.Speech).

Example

The grammar in the following example contains slang words and also has an uncommon word: "whatchamacallit". Adding a custom, inline pronunciation using the sapi:pron attribute can improve the accuracy of recognition for the word "whatchamacallit" as well as for the entire phrase that contains it. The example uses phones from the Microsoft Universal Phone Set (UPS) to define the custom pronunciations.

<?xml version="1.0" encoding="utf-8"?>
<grammar xml:lang="en-US" root="slang" 
tag-format="semantics/1.0" sapi:alphabet="x-microsoft-ups" 
version="1.0" xmlns="http://www.w3.org/2001/06/grammar"
xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">

  <rule id="slang">

    <one-of>
      <item> give me </item>
      <item> gimme </item>
      <item> hand me </item>
      <item> ha'me </item>
    </one-of>

    <one-of>
      <item> the </item>
      <item> duh </item>
    </one-of>

    <one-of>
      <item> thingamajig </item>
      <item> <token sapi:pron="W AX T CH AX M AX K AA L IH T"> whatchamacallit </token> 
      </item>
    </one-of>

  </rule>

</grammar>

See Also

Concepts

grammar Element (Microsoft.Speech)

Lexicons and Phonetic Alphabets (Microsoft.Speech)

Other Resources

Using Custom Pronunciations