Condividi tramite


Object Tokens and Registry Settings SAPI 5.4

Microsoft Speech API 5.4

Object Tokens and Registry Settings

 

 

1          Contents

1      Contents. 1

2      Summary.. 2

3      Overview: Tokens, Categories and the Registry.. 2

3.1       Tokens. 2

3.2       Categories. 5

3.3       TokenIDs and CategoryIDs. 7

3.4       User Defaults. 7

3.5       Token Enumerators. 8

4      Using Tokens and Categories. 9

4.1       Helper Function Examples. 9

4.2       Enumerating tokens. 11

4.3       Instantiating an Object from a Token.. 14

5      Tokens and Categories For Engine Developers. 15

5.1       Making Resources Available Through SAPI15

5.2       Associating Files with Tokens. 15

5.3       Inspecting UUnderlying Keys of a Token.. 16

5.4       Creating New Keys in the Registry.. 16

6      Registry Settings. 17

6.1       Category: Voices. 18

6.2       Category: Recognizers. 19

6.3       Category: RecoProfiles. 22

6.4       Category: AudioInput. 23

6.5       Category: AudioOutput. 23

6.6       Category: AppLexicons. 23

6.7       Category: PhoneConverters. 23

6.8       UserLexicons. 24

7      Index of Tables. 24

 

 

2         Summary

This document is intended to help developers of speech-enabled applications discover and use resources (Voices/Recognizers) on a computer that has SAPI installed. A speech-enabled application is one that attempts to either recognize or synthesize speech.Developers of speech recognition (SR) and speech synthesis (Text to Speech or TTS) engines make their resources available to applications.

This spec answers the following questions:

·         What are Tokens and Categories in SAPI?

·         Where is information about tokens stored in the Registry?

·        How does an application find tokens and initialize resources (i.e., Voices or Recognizers) from them? 

·         What are the SAPI-defined attributes that engines should document in the registry?

·         How are files associated with tokens?

Note:

The Speech SDK documentation section on Object Tokens, which provides a complete description of the ISpObjectToken and ISpObjectTokenCategory and their methods, complements this document.

3         Overview: Tokens, Categories and the Registry

3.1       Tokens

 

A token is an object representing a resource that is available on a computer, such as a voice, recognizer, or an audio input device. A token provides an application an easy way to inspect the various attributes of a resource without having to instantiate it. The Vendor of a Recognizer, and Gender of a Voice are examples of attributes of resources. In many cases, applications should use SAPI-provided helper functions for common scenarios. For example, an application can use the SpCreateBestObject helper function to rapidly create the object, given a certain type of resource. The application can also query for tokens meeting certain criteria without using the helper function. To do this, the application calls the EnumTokens method on the ISpObjectTokenCategory interface to get an enumerator, and inspect the tokens in the enumerator further if it chooses to. Finally, the application selects one of the tokens in the enumerator to instantiate a resource. Once the resource (such as SR Engine) is instantiated, if it implements the ISpObjectWithToken interface, then it is handed a pointer to the token that was used to create it. This way, the resource contains a handle to more information about itself.

 

Conceptually, a token contains the following information:

·         The language-independent name. This is the name that should be displayed wherever the name of the token is displayed. It is marked as (Default) in the registry. The implementer of the token may also choose to provide a set of language-dependent names in several languages.

·         The CLSID used to instantiate the object from the token.

·         A set of Attributes, which are the only set of queriable values in a token. This means SAPI provides a mechanism to query for tokens whose attributes match certain values. Details on how to query for tokens that match a set of attributes are in Sections 4.1 and 4.2.

 

 

A token may also contain the following:

·         If a token has user interfaces (UIs), such as the properties of a Recognizer or a wizard to customize a Voice to display, then the token will also contain the CLSID for the COM object used to instantiate each type of UI.

·         The set of Files from which SAPI returns the paths to all the associated files for the token. 

 

SAPI stores information about tokens in the registry. A token is represented in the registry by a key and the key's underlying keys and values. When an application queries SAPI for tokens of all the female voices on the computer, SAPI will look at the HKEY_LOCAL_MACHINE\Software\Microsoft\Speech\Voices area. This corresponds to a Category and categories are discussed in the Section 3.2. SAPI searches for tokens that match the criteria (in this case, a voice with the Gender attribute set to female) and uses one of these matching tokens to initialize the voice. The application may also specify a different fully qualified registry path to specify any non-standard (from a SAPI) location in the registry for SAPI to search for a token. In addition to the keys SAPI recommends, the entry for the token may contain any other bits of information that the implementer of the token can store here. In the registry, a token looks like this:

Table 1 Parts of a Token in the Registry

RegKey

ValueName

Sample Value

Comments

SampleTokenKey

 

 

Required - This is the RegKey for the Token.

 

(Default)

Joe

Required - Language Independent Name.

 

409

Joe

Name in Hex LangID 409, which is English. There may be several of these rows, one for each LangID in which the Token has a name. Note, no leading 0x before the LangID.

 

809

Joe

 

CLSID

{8021D50E-D93C-4075-8504-FC4E124D64E9}

Required - Sample CLSID for object which instantiates the token.

SampleTokenKey/Attributes

 

 

Attributes for the token are under this key.

 

Language

409;809

There may be several of these rows, one for each attribute that is queriable. See Section 4 for an explanation of each of the attributes.

 

Vendor

VoiceVendor

 

In the registry, this looks like:

Figure 1 A Token Key in the Registry

Ee431801.image_trs_002(en-us,VS.85).gif

The Attributes key contains all the queriable values for the token. Section 4.2 discusses in detail how an application queries a token.

 

Figure 2 Attributes of a Token

Ee431801.image_trs_004(en-us,VS.85).jpg

 

If the token is capable of displaying UI, then each UI has its own key under the token. Fig 3 shows the token for a Recognizer that supports four types of UI: AddWord, EngineProperties, MicTraining and UserTraining, as well as the CLSID underlying each UI type.

 

Figure 3 A Token that supports UI has a token for each UI type

Ee431801.image_trs_006(en-us,VS.85).jpg

 

SAPI provides a comprehensive set of helper functions for the common scenarios using tokens. Section 4.1 provides a number of examples. SAPI also provides a way for engines and applications to implement tokens in their own proprietary manner. See Section 3.4 on token enumerators, for further discussion. Sections 4 and 5 explore common scenarios using these interfaces from application and engine coding perspectives.

 

3.2       Categories

A ObjectTokenCategory (hereafter referred to as category) is the highest level of grouping of registry entries in SAPI. A category is a class of tokens (or of resources, since each token represents an actual resource on the computer). Intuitively, a category is a type of SAPI resource. It is represented in the registry by a key containing one or more token keys under it. It is created and manipulated using helper functions such as SpCreateDefaultObjectFromCategoryIDor methods on the ISpObjectTokenCategory interface. Please refer to the SAPI documentation for details on either of these. Examples of categories are Recognizers and Voices. Figure 4 shows the default SAPI categories, with the category Voices selected.

Figure 4 The Category VoicesEe431801.image_trs_008(en-us,VS.85).jpg

SAPI organizes tokens in the Registry under seven categories.

 

By default, the following tokens for six of the SAPI categories are located under HKEY_LOCAL_MACHINE\Software\Microsoft\Speech (HKLMS). This is where all system-specific SAPI keys and values should be stored as recommended by Windows Application guidelines. Examples include settings and files for Voices and the Recognizers ( also known as Speech Recognition engines) installed on a computer, as shown in Figure 1.


1. Voices

2. Recognizers

3. AppLexicons

4. AudioInput

5. AudioOutput

6. PhoneConverter


The tokens for the other category, Recoprofiles, are located under HKEY_CURRENT_USER\Software\Microsoft\Speech (HKCUS).HKCUS also contains all other user-specific keys and values in the registry, such as user defaults for Voices, Recognizers, as well the location of the user lexicon file.

 

Categories contain the following items:

 

·         A single key called Tokens, and the keys for the tokens that belong to that category under it. For example, the Voices category has a key for the voice called Manuel. All the keys and values for Manuel are located under HKLMS/Tokens/Manuel.

·         Keys for token enumerators. A token enumerator is a special type of token that generates other tokens for the same category. This token provides a way for Vendors to generate tokens that are generated in non-standard way, such as, reading data from a stored file stored. Those engine vendors following SAPI guidelines for registering resources (Sections 4 and 5) can safely ignore these and regard them as generators for another set of tokens. Section 3.4 explains token enumerators in more detail.

 

 

3.3       TokenIDs and CategoryIDs

 

A CategoryID uniquely identifies a category in the registry. For SAPI defined categories they take the form of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\{CategoryName}. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\ for the Recognizers category. All SAPI CategoryIDs should be referenced using the constants defined in sapi.idl file:

 

1.       SPCAT_AUDIOOUT

2.       SPCAT_AUDIOIN

3.       SPCAT_VOICES

4.       SPCAT_RECOGNIZERS

5.       SPCAT_APPLEXICONS

6.       SPCAT_PHONECONVERTERS

7.       SPCAT_RECOPROFILES

 

Similarly, TokenIDs uniquely identify tokens in the registry. For tokens located in SAPI defined categories, they take the form of:

 

·             CATID\Tokens\TokenKeyName - a static token from the registry. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\MSASREnglish

·             CATID\TokenEnums\TokenEnumKeyName - a static token from the registry that represents a token enumerator. This token instantiates a token enumerator used to enumerate dynamic tokens. SAPI uses this for its own implementation of audio input and output to list just the channels available on the computer at runtime. Token enumerators can also read tokens from other areas of the registry, or from remote computers. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn

·             CATID\TokenEnums\TokenEnumKeyName\ - a dynamic token representing the default token that the specified token enumerator generates. For example, SPDSOUND_AUDIO_IN_TOKEN_ID creates the default Dsound audio in an object.  For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\

·             CATID\TokenEnums\TokenEnumKeyNameEnumExtra… - a specific dynamic token from the specified token enumerator. For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\Direct Sound Crystal WDM Audio, which generates the Direct Sound Crystal WDM audio object.

 

3.4       User Defaults

In addition to the category defaults mentioned in Section 3.2, the categories Voices, Recognizers, AudioInput, AudioOutput and RecoProfile, also have user defaults and settings. As shown in Figure 5, these are located in the HKCUS area, under their respective category keys. Section 6 explains each category of tokens. This section also lists out the user-specific entries in the HKCUS and the system-wide entries in HKLMS.

 

Figure 5 The User category for Recognizers


3.5       Token Enumerators

** **

Note: This section is relevant only forEngine or Application developers who need to store tokens in a separate part of the registry or even on the file system, and dynamically enumerate them.

 

SAPI provides a way for third parties to store their registry settings without following any of the SAPI-recommended guidelines. SAPI can find these tokens as long as the parties have implemented token enumerators. Token enumerators are COM objects that enumerate the necessary entries for the tokens under it. All token enumerators are stored under CategoryName/TokenEnums. Each token enumerator listed under a category needs to have the CLSID of the COM object that implements it under the token enumerator.

 

The token enumerator

·         Must implement the methods Next, Skip, Reset, Clone, Item, GetCount on the IEnumSpObjectToken interface.

·         May choose to implement methods SetObjectToken and GetObjectToken on ISpObjectWithToken interface. As mentioned in the end of Section 3.1, these give a resource a handle to the token that was used to instantiate it.

 

These tokens can be located in a separate part of the registry or somewhere else (possibly on the flusters). It is the responsibility of the token enumerator to return correctly on the above methods so an application does not know the difference between tokens coming from the token enumerator and tokens coming from the SAPI-specific part of the registry.

 

SAPI itself uses token enumerators only for the AudioInput and AudioOutput categories. Refer to Sections 6.4 and 6.5 for more details. Note that the token enumerator for the MMSYS audio object creates its tokens from keys that are under it.

 

The following is an example of what a TokenID for a token located under a token enumerator looks like: CategoryName/TokenEnums/TE1/XXX where (i) TE1 is a sample token enumerator and (ii) XXX is a reference to one of the tokens generated by TE1. On a call to the helper function SpCreateCreateNewToken giventhe TokenID above, the IEnumSpObjectToken returned by the token enumerator TE1 to SAPI includes all tokens. SAPI then goes through each token (those returned by token enumerators and those under the tokens key) to find the one that has a Token name matching XXX.

 

Table 2 Parts of the AudioInput token enumerator     

RegKey

ValueName

Sample Value

Comments

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\

 

 

This is the category.

 

DefaultDefaultTokenID

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\

This is the TokenID for the default token for this category. If the DefaultTokenID  is present, it will supercede this default token for the category. Details in section 4.2

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums

 

 

 

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums\MMSys

 

 

This is the MMSys token enumerator

 

CLSID

 {GUID}

This is the CLSID for the COM object that implements the MMSound token enumerator.

 

Figure 6 AudioInput token enumerator in the registry

Figure 6 illustrates how the AudioInput token enumerator looks in the registry.

 

Ee431801.image_trs_012(en-us,VS.85).jpg


4         Using Tokens and Categories 

4.1       Helper Function Examples

A SAPI 5 application needs to find tokens and instantiate objects that meet certain criteria from the resources available on a computer. Helper functions distributed in the sphelper.h file are the recommended way for applications to interact with tokens and categories whenever possible. Table 3 provides a list of helper functions and the scenarios they address. The helper functions have been broken up into Common Helper Functions and Engine Developer Helper Functions based on likelihood of use.  If the specific helper function is not found in either section, refer to the SAPI documentation for the comprehensive listing.

 

Table 3 Common Helper Functions

Helper Function

Action

Example Helper Function Call 

SpGetDefaultTokenFromCategoryID

Creates the default token from a CategoryID. The last argument tells SAPI to create the token if it does not currently exist.

CcomPtr<ISpObjectToken> m_cpEngineToken;

hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken);

SpFindBestToken

Finds the most appropriate token given a set of required and optional criteria. For details on attribute matching see Section 4.2

CComPtr<ISpObjectToken> cpTokenEng;

hr = SpFindBestToken(SPCAT_RECOGNIZERS, L"Language=409", L"VendorPreferred", &cpTokenEng);

       

SpEnumTokens

Returns a token enumerator containing all tokens meeting a set of required and optional attributes. Tokens in the enumerator are sorted in the order specified in the Section 4.2.

CcomPtr<IEnubjectTokens>  cpIEnum;

hr = SpEnumTokens(SPCAT_VOICES, L"Gender=Female;Language=409", L"Vendor=VoiceVendor1;Age=Child"

, &pEnum);

SpCreateDefaultObjectFromCategoryID

 

 

Creates the default object in a category, such as AudioInput or Recognizer

CComPtr<ISpVoice> cpVoice; SpCreateDefaultObjectFromCategoryID(SPCAT_VOICES, &cpVoice);

SpCreateBestObject

Instantiates a resource that best matches a set of required and optional criteria. For details on attribute matching see Section 4.2

CComPtr<ISpVoice> cpVoice;

SpCreateBestObject(SPCAT_VOICES, L"Vendor=VoiceVendor1;Age=Child", L"Gender=Female", &cpVoice);

 

SpCreateObjectFromToken

Creates an object from a token.

CComPtr<ISpVoice> cpVoice;

CComPtr<ISpObjectToken> cpVoiceToken;

//--like last step

SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken);

/--now create object

SpCreateObjectFromToken(cpVoiceToken, &cpVoice);

  }

 

 

Table 4 Engine Developer Helper Functions

Helper Function

Action

Example Helper Function Call 

SpCreateNewToken

Creates a new object token in the registry with CategoryID, but without specifying a keyname. This creates a token with a GUID as its registry key.

CComPtr<ISpObjectToken> cpUserToken;

hr = SpCreateNewToken(SPCAT_RECOPROFILES, L"", &cpUserToken);

                                                                                                                                                                                                                                                                                                                                                                                            cpUserToken;

SpGetTokenFromID

Creates a token from a TokenID of an enumerator or a new token if the token does not already exist. The last argument of FALSE tells SAPI not to create the token if it does not already exist.

CComPtr<ISpObjectToken>      cpAudioInTok;

hr = SpGetTokenFromID(SPCAT_AUDIOIN, &cpAudioInTok, FALSE)))

SpCreateObjectFromSubToken

Creates an object from a subtoken of a token. In this case, the engine token pEngineToken has the Lts key under it, which in turn has a CLSID value under it. This CLSID is used to instantiate the object.

CComPtr<ISpObjectToken> m_cpEngineToken;

hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken);

ISpLexicon *        m_pLtsLex;

HRESULT hr = SpCreateObjectFromSubToken(pEngineToken, L"Lts", &m_pLtsLex);

SpGetSubTokenFromToken

Creates a subtoken under a token. This is useful, for example, when an Engine vendor would like to create a subtoken for custom data under its Recognizer token.

CComPtr<ISpObjectToken> cpSubSubToken;

hr = SpGetSubTokenFromToken(&m_cpEngineToken, L"EngineProperties", &cpSubSubToken, TRUE );

 

4.2       Enumerating tokens

The principal tasks related to tokens and categories that an application needs to accomplish are:

·         Enumerating tokens

·         Inspecting and instantiating tokens

 

The two primary ways to enumerate tokens are by the helper function SpEnumTokens, or by the methodISpObjectTokenCategory::EnumTokens. Both methods allow the caller to specify a category and a set of required and optional attributes. The call then returns a token enumerator containing all the tokens matching those criteria. The method is defined as:

 

HRESULT EnumTokens(

        [in] const WCHAR *pszCatName,

        [in, string] const WCHAR *pReqAttrs,

        [in, string] const WCHAR *pOptAttrs,

        [out] IEnumSpObjectTokens **ppEnum);

 

When identifying matching tokens under in a category, an application needs to specify a fully qualified category identifier (FQCID). An FQCID is the full registry path to a category, such as HKEY_CURRENT_USER\Software\Microsoft\Speech\Voices. It is recommended that these categories be referenced using the constants defined in the sapi.idl file below, and not using the full string to minimize typos in commonly used registry paths. SAPI maps the constant to the correct hive in the registry and returns matching tokens from the category. For instance, the SAPI defined AudioInput constant (from the sapi.idl file) is:

 

//--- Categories for speech resource management

const WCHAR SPCAT_AUDIOOUT[]    = L"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech\\AudioOutput";

 

Similarly, there are constants for the AudioInput, Voices, Recognizer, Applexicon, PhoneConverter, and RecoProfile categories.

 

An application may also specify a non-standard registry location by simply providing its FQCID, such as HKEY_CURRENT_USER\Software\TTSVendor1\Speech\Voices

 

In both SpEnumTokens and ISpObjectTokenCategory::EnumTokens the following clauses are permitted in the ReqAttrs and OptAttrs strings, separated by semicolons.

 Table 5 Query Operators

Condition

Example

Explanation

Exists

Telephony;Dictation

The valuenames Name and Dictation exist in the list of attributes for this token.

One of

Language=409

At least one of the values of the Valuename Language is 409. There may be other values, like 809, 512 as well.

Not Equals

Age!=Child;Age!=Teen

Values of Age that are neither "Child" nor "Teen".

 

The tokens are sorted "best matches" first using the following intuitive rules:

 

1.       Only tokens matching the required attributes are returned.

2.       Those tokens matching the optional attributes as well will be before those that just match the required attributes.

3.       If there are no required or optional attributes (i.e., both are set to NULL), the first token is the default token for that category. If there is a valid DefaultTokenID in HKLMS/Category, that is returned as the default tokenID. If not, if there is a default tokenID in HKCUS/Category, that is returned. If none of these exist, SAPI searches for a DefaultdefaultTokenID in HKLMS/CategoryName, and that is returned.

4.       Matching Rules: If a token matches an optional attribute, it gets a score of 1, otherwise, 0 for that attribute. The optional attributes mentioned earlier in the query string are more significant. These scores are concatenated as shown in Table 7. The tokens are then placed in descending order. This is illustrated in Tables 6 and 7.

5.        Tokens having the same score are returned in random order in the enumerator.

 

A call to EnumTokens could look like:

 

CComPtr<IEnumSpObjectTokens>  cpEnum;

CComPtr<ISpObjectTokenCategory> cpVoiceCat;

           

HRESULT hr = cpTokenCategory.CoCreateInstance(CLSID_SpObjectTokenCategory);

const WCHAR Req_Attrs[ ]=L"LanguagesSupported=409";

const WCHAR Opt_Attrs[]=L"Vendor=VoiceVendor1;Age=Child;Gender=Female";

 

HRESULT hr = cpVoiceCat->EnumTokens(ReqAttrs , OptAttrs , &cpEnum);

// SPCAT_VOICES is defined in sapi.idl

 

If the following voices are installed on a computer as shown in Table 6:

Table 6 Voices installed on a computer

Voice

Vendor

Age

LanguagesSupported

Gender

Michelle

VoiceVendor1

Child

409; 411

Female

Mary

VoiceVendor1

Adult

409

Female

Jane

VoiceVendor2

Child

409

Female

Frank

VoiceVendor2

Adult

411

Male

Anna

VoiceVendor2

Adult

411

Female

 

Then the order of the Voices returned in cpEnum will be as shown in Table 7:

Table 7 Scoring of tokens matching optional criteria

Optional Criteria ->

Vendor

Age

Gender

Net Score

Michelle

1

1

1

111

Mary

1

0

1

101

Jane

0

1

1

011

 

The final order is:

1.       Michelle (meets all required criteria, scored 111 on optional criteria)

2.       Mary (meets all required criteria, scores 101 on optional criteria)

3.       Jane (meets only required criteria, score 11 optional criteria)

 

If the call to EnumTokens is changed to:

 

HRESULT hr = cpVoiceCat->EnumTokens(SPCAT_VOICES, NULL, NULL, &cpEnum);

 

and the users default token in HKCUS\Voices\DefaultTokenID is set to: HKEY_LOCAL_MACHINE\Software\Microsoft\Speech\Voices\Tokens\Jane

then the enumerator cpEnum will contain all the tokens, with Jane being the first token.

 

What does SAPI do when ISpObjectTokenCategory::EnumTokensis called?

 

Consider a fictitious category that has both tokens and token enumerators under it. When an application calls the SAPI ISpObjectTokenCategory::EnumTokens*,* the following things happen:

 

1.       SAPI creates an enumerator called IEnumSpObjectTokens that can enumerate all the matching tokens from these keys under HKLMS/Voices/Tokens.

2.       Token enumerators Step (skip this step if not using token enumerators).

a.       SAPI searches for a CategoryName/TokenEnums key. If found, it instantiates a token enumerator from each of the tokens under this key, one by one.

b.       Each of the token enumerators return an IEnumSpObjectToken containing matching tokens under it that is merged with the IEnumSpObjectToken created in (i).

3.       SAPI applies the required attributes so that the IEnumSpObject enumerator contains only those tokens that match these Attributes, then it sorts them according to how well they match the optional attributes (exact rules earlier in Section 4.2).

4.       The application searches for an appropriate token and until one is found, it steps through each token, and further checks attributes and strings of each token with ISpObjectToken methods GetData, GetStringValue, and GetDWORD (inherited from ISpDataKey).

5.       The application identifies the token it is interested in and calls ISpObjectToken::CreateInstance and QIs the newly created object to see if the newly created object supports the ISpObjectWithToken interface. If it does, SAPI calls ISpObjectWithToken::SetDataKey to give the newly instantiated object a pointer to the token it was instantiated from.

 

4.3          Instantiating an Object from a Token

Continuing with this example, the application now has a pointer to the enumerator IEnumSpObjectTokens. An application may choose to step through the enumerator with the methods Next, Skip or Reset to find an ISpObjectToken that best meets its needs. Assume that the application is searching for a voice that sounds clear over a telephone. Also assume that such voices typically have a ValueName called SupportsTelephony, which is set to 1. There is no such protocol in SAPI; this is for illustration only. Because this is not a value under Attributes, it cannot be picked up by the standard query mechanism of required attributes. The variable pCurVoiceToken represents a token for that category. In the example below, the category is populated with tokens in cpEnum until a voice is found that also supports Telephony. 

 

 

ISpObjectToken         *pCurVoiceToken;

 

bool             fFeature = false;

 

while (cpEnum->Next(1, &pToken, NULL) = S_OK)

    {

// At this point, all we know is that pToken is a pointer to a Voice token.

 

      hr = pToken->GetData(L"SupportsTelephony", fFeature);

// Note, ISpObjectToken inherits from ISpDataKey

 

      if (( SUCCEEDED( hr ) ) && fFeature )

      {

        // this is the token for the Voice we want

        pCurVoiceToken =pToken; 

        break;                                                                    

      }

    }

At this point, store the selected Voice token in pCurVoiceToken. Now create the voice object from this token, so that Speak and other methods on it may be called. To create a voice object, ISpVoice must be created.

 

EXTERN_C const CLSID CLSID_SpVoice;

 

CComPtr<ISpVoice>      cpVoice;

 

// The Application may want to check to see if the token has any associated UI that it needs to display

hr = pCurVoiceToken->IsUISupported(SPDUI_EngineProperties, NULL, 0, NULL, &fSupported);

   

// The Application calls the UI, or maybe enables a button in its own UI so the user can call the UI

 

// Next, CoCreate an instance of SpVoice called cpVoice

hr = cpVoice.CoCreateInstance(CLSID_SpVoice);

 

    if( SUCCEEDED( hr ) )

    {

      // set cpVoice to our selected voice token

      hr = cpVoice->SetVoice(pCurVoiceToken);

    }

 

 

At this point the cpVoice object (of type ISpVoice) has been instantiated and is ready to speak, with a call such as:

 

hr = cpVoice->Speak( L"This audio file was created using SAPI five text to speech.", 0, NULL);

5         Tokens and Categories For Engine Developers

In addition to the enumerating and instantiating tokens, an engine vendor also needs to be able to:

·         Create new tokens

·         Associate files with tokens

 

5.1       Making Resources Available Through SAPI

There are several straightforward steps for an SR or TTS engine to be discoverable by SAPI:

 

1.       Make an appropriate entry under the correct CategoryID/Tokens in the registry (details in Section 6).

2.       Make an entry under CategoryID/TokenEnums if the vendor prefers dynamic tokens (i.e., the engine registry information is already stored in some other registry location or file). The enumerator should implement the interfaces outlined in Section 3.4.

3.       Look at the standard attributes for a category in SAPI and identify the characteristics of the engines so that applications can query the engine for these properties.

4.       Hand the SR engine a pointer to the recognition profile token once the RecoInstance has been created.

 

5.2       Associating Files with Tokens

One of the key issues for an engine vendor is to associate files with tokens in the registry, such as the language model files for a Recognizer or a RecoProfile token. A token can query for all the files under its Files key using the ISpObjectToken::GetStorageFileName method. SAPI searches for the file in a number of known locations. Because of the possibility of roaming, SAPI does not store fully qualified file paths in the registry (such as C:/Documents and Settings/JoeUser/Local Settings/Application Data), but stores paths such as %1c%\Microsoft\Speech\Files\MSASR\SP_81738BE4B81F42F0BFC4BB98B72EB81A.spz instead. SAPI queries the ShGetFolderPath .dll for the user's non-roaming directory on the individual computers. The calling application can specify (i) the specific name of the file if any, and (ii) the subdirectory to put the file in. Refer to the GetStorageFileName documentation for the exact interfaces. The engine may append any additional vendor-identifying directory names to indicate engine-specific data. Deleting the tokens with which the files are associated by callingISpObjectToken::RemoveStorageFileName, will remove files from the file system as well.

Caveat: If roaming is enabled, the user's RecoProfiles in the HKCUS hive of the registry will roam (because the entire HKCUS hive roams); the associated files, situated in a non-roaming directory will not. This causes two unexpected effects:

 

1.       When the Recognizer is initiated on the second computer, the Recoprofiles are likely to be missing. The Recognizer needs to be able to handle this and copy the necessary new-profile files.Known issue: Upon roaming the Microsoft SR Engine currently creates a new set of files, but these have entirely different names from the names on the original computer. As a result, when the registry is roamed back to the original computer, the original profile files become orphaned.

2.       Subsequently, upon deleting the Recoprofiles from one computer, all the associated files and registry entries on that computer will be deleted. The rest will become orphans, that is, files without pointers to them.

 

5.3       Inspecting Underlying Keys of a Token

 

Besides helper functions, keys under a token can be inspected using a recognizer token, and opening the attributes key under it as a DataKey. Then all the ISpDataKey methods are available to inspect the values under the Attributes key. The sample below goes from the Recognizer token, to the attributes key under it, and finally to the "Desktop" and "Telephony" strings under that.

 

hr = SpGenericSetObjectToken(pToken, m_cpEngineObjectToken);

  if(FAILED(hr))

  {

    return hr;

  }

 

  // Read attribute information

  CComPtr<ISpDataKey> cpAttribKey;

  hr = pToken->OpenKey(L"Attributes", &cpAttribKey);

 

  if(SUCCEEDED(hr))

  {

    WCHAR *psz = NULL;

    hr = cpAttribKey->GetStringValue(L"Desktop", &psz);

    ::CoTaskMemFree(psz);

    if(SUCCEEDED(hr))

    {

      // This instance of the engine is for doing desktop recognition

    }

    else if(hr = SPERR_NOT_FOUND)

    {

      hr = cpAttribKey->GetStringValue(L"Telephony", &psz);

      ::CoTaskMemFree(psz);

      if(SUCCEEDED(hr))

      {

        // This instance of the engine is for doing telephony recognition

      }

    }

  }

 

 

5.4       Creating New Keys in the Registry

 

Below is another snippet of code where the Microsoft Sample Engine creates a new entry under a recognition profile. If the Recognition Profile does not exist for the engine (pszCLSID contains a pointer to the Engine GUID), it needs to be created it as well as the Gender and Age values under it.

 

 

  // Read attribute information from Engine key;pProfile is the RecoProfile token we obtain by calling GetRecoProfile on the Recognition Instance.

 

hr = pProfile->OpenKey(pszCLSID, &dataKey);

  if(hr = SPERR_NOT_FOUND)

  {

    // This user profile has not been seen before, so create a new registry key to hold info for it

    hr = pProfile->CreateKey(pszCLSID, &dataKey);

 

    // Now set some default values

    if(SUCCEEDED(hr))

    {

      hr = dataKey->SetStringValue(L"GENDER", L"UNKNOWN");   

    }

    if(SUCCEEDED(hr))

    {

      hr = dataKey->SetStringValue(L"AGE", L"UNKNOWN");   

    }

 

    // Now create some temporary file storage for trained models

    // this will create a valuename called SampleEngTrainingFiles and value C:\Documents and Settings\username\application data\microsoft

speech\files\MSASR\LM7454901D23334AAF87707147726EC235.dat

 

    if(SUCCEEDED(hr))

    {      hr = pProfile->GetStorageFileName(CLSID_SampleSREngine, L"SampleEngTrainingFile", "MSASR\LM%d.dat", CSIDL_FLAG_CREATE, &pszPath);

    }

 

 

    // and request a UI for user training or properties - SPDUI_RecoProfileProperties

    hr = AddEventString(SPEI_REQUEST_UI, 0, SPDUI_UserTraining);

 

6         Registry Settings

This section documents in some detail, the registry settings of each category of tokens in both the HKCUS and the HKLMS hives. Each token entry needs to have the required keys and values for a token as outlined in Table 1. To find the most suitable token on the computer for the Recognizers, Voices, and Phone Converters categories of tokens, an application needs to define a standard set of attributes that applications can query for. It is important for engine vendors to implement these keys exactly as specified because the engines/voices must be discoverable through SAPI to applications.

 

It is important to note that in addition to the specified keys and values, a vendor may create any keys and values necessary to use as a resource in the registry. SAPI will ignore these values and not disturb them in any way, unless SAPI is uninstalled from the computer.

 

6.1       Category: Voices

 

The Voices category enumerates every voice installed on the computer by all TTS engines. The voice tokens should be located under the key:

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens


The requirements for a voice token are listed below, with some sample values.

·         Each voice token should meet the requirements for a standard token.

·         Voices should document the SAPI-specific attributes that describe them so applications can search for them. Table 8 contains a full listing of Voices attributes and their locations. All Voice attributes are required. Section 3.1 and Section 4.2 have more information about attributes and querying them.

·         Voices may have their own Vendor-specific UI implemented by the TTS Engine rendering the voice. If such UI is present, then the UI needs a separate token in the location described in Table 8. The minimum requirement is that the token contain the CLSID of the COM object implementing the UI. Click Properties on the Text-to-Speech tab of the Speech Control Panel to access the Vendor-specific UI. The Properties button will be unavailable if the EngineProperties token for the current default Voice is not supported

 

Table 8 provides a detailed listing of the registry entries that constitute a sample voice token called VoiceToken1 under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens 


Table 8 Voice Registry and Attributes

RegKey

ValueName

Comments

VoiceToken1

 

Required - This is the RegKey for the Token.

 

(Default)

Required - language independent name.

 

409

Name in Hex_LangID 409, which is English. There may be several of these rows, one for each LangID in which the token has a name. Note, no leading 0x before the LangID

 

809

 

CLSID

Required - Sample CLSID for object which instantiates the voice.

VoiceToken1/Attributes

 

Attributes for the Token are under this key.

 

Age

Required - Value should be "Child," "Teen," "Adult," or "Senior" depending on Age of TTS Voice. Senior indicates an elderly voice. Vendors may choose to classify some voices as both "Senior" and "Adult".

 

Vendor

Required – TTS engine Vendor name.

 

Language

Required - The LCID in hex of language this engine speaks.  

 

Gender

Required - Value should be "Male" if Male voice, "Female" if female.

 

VendorPreferred

Required - If this is the Default voice for the vendor named in vendor.

 

Name

Required - String representing language independent name

VoiceToken1\UI

 

Required, if the Voice has UI - UI tokens for the voice token will be stored under this key.

VoiceToken1\UI\EngineProperties

 

The only SAPI-specific UI token is EngineProperties. Called when the user clicks Properties on the Text-to-speech tab.

 

CLSID

Required - Sample CLSID for object which instantiates engine-specific UI from Speech properties in Control Panel.

Note: Please refer to the registry entries of the Microsoft recognizer and the Sample Engine, which ship in the SAPI 5 SDK, as an example of how the are entries are created. 

There is also a Voices category in the HKCUS hive that stores the following:

·         The default TTS rate selected by the user using Speech properties in Control Panel.

·         The default voice selected by the user.

 

 

Table 9 provides a listing of the user registry entries that constitute a voice token in

HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\

 

Table 9 Voices - User Registry Settings

 

ValueName

Value

Comments

VoiceToken1

DefaultTokenID

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\MSMary

This points to the default voice selected by the user in Speech properties in Control Panel. It is empty at installation time and gets populated when the user selects a voice.

 

TTSRate

5

{Scale is 0 to +10, with 0 being slowest}. This applies to all voices on the system for the currently logged on user.

 

Note: The TTS engine does not need to store any of these values, SAPI takes care of that.

 

Vendors may choose to store any additional keys and values in the same areas of the registry. Following is additional information relating to voice tokens:

·         User specific entries for the voice (such as volume, pitch, rate, and any other information) should be stored in keys and values under  HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\Tokens\VoiceToken1

This creates a structure in the HKCUS hive parallel to the one in the HKLMS hive.

·         Entries applying to all the voices using an engine should be stored underHKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Voices\Tokens\EngineGUID1

·         Non-user entries (pertaining to all users on the computer) for a voice should be stored in keys and values under the categoryHKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Voices\Tokens

 

 

6.2       Category: Recognizers

 

SAPI enumerates all the SR Engines installed on the computer from the tokens and token enumerators under

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers

Below are the guidelines for registering recognizer tokens:

 

·         Each recognizer token should meet the requirements for a standard token (Table 1).

·         Each speech recognition engine installed on the computer should have a recognizer token. If vendors use a single recognizer for recognition in multiple languages (with different acoustic models), or a discrete and a continuous recognizer, they may choose to store the relevant data files and other initialization information under separate tokens, but use the same value for the CLSID. For example, a vendor may use the same recognition engine to recognize both Japanese and English. In this case, there are two tokens, both containing the CLSID of the same recognizer, but associated with different language and acoustic model files stored with the token.

·         Recognizers should document the SAPI-specific attributes shown in Table 10 so that applications can search for them. Required attributes are also indicated Table 10.

·         Most speech applications written with SAPI will be tested for a specific engine, or a few specific engines, if the application has a clear need for multiple engines. Typically applications will query for and use this engine by default. Use attributes when the application cannot find its preferred engine (or doesn't have one), and needs to locate the most suitable engine installed on the computer for its needs.

·         Recognizer tokens may have an Alternate CLSID if they implement alternates.

·         Recognizer tokens may have a RecoExtension CLSID for objects that extend SAPI's recognition context.

·         The Recognizer may also have a number of engine-specific UIs that it exposes to SAPI. There should be a separate key under {Recognizer TokenID}/UI/ for each such UI supported. The keys are listed and documented in Table 10 below.

 

Table 10 provides a detailed listing of the registry entries that constitute a sample voice token called VoiceToken1 under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers\Tokens 

 

Table 10 Sample Entries of a Recognizer token

RegKey

ValueName

Comments

RecognizerToken1

 

Required - This is the RegKey for the Token.

 

(Default)

Required - Language Independent Name

 

409

Name in Hex_LangID 409, which is English There may be several of these rows, one for each LangID in which the token has a name. Note, no leading 0x before the LangID

 

AlternatesCLSID

CLSID for the object that implements alternates.

 

RecoExtension

CLSID for the object that extends the recognition context provided by SAPI.

RecognizerToken1\Attributes

 

 

 

Vendor

Required - Company Name

 

Language

Required - The LCIDs in hex of the language(s) this engine recognizes. Typically this is only one per recognizer token. Recognizers recognizing multiple languages can expose that in the manner shown. In addition, a value of "409;9" indicates that the Recognizer recognizes generic English (American, British, Australian, etc.). 

 

SpeakingStyle

Required - Value should be "Discrete" if the engine requires pauses between words for recognition. Recognizers without this requirement should have "Discrete;Continuous" for this value.

 

Dictation

Required - If the engine supports dictation, it must contain this value.

 

CommandAnd Control

Required - If the engine supports command and control. This value must be there if the engine supports Command and Control.

 

Desktop

The engine supports desktop audio.

 

Telephony

The engine is configured to recognize audio coming in from a telephony channel.

 

VendorPreferred

This token is the vendor's default token.

 

Alternates

Value is "CC" if the engine supports Command and Control alternates. Value is "Dictation" if it supports dictation alternates. If both types of alternates are supported, then the value should be "CC;Dictation"

 

Hypotheses

The engine returns hypotheses before final recognition.

 

WordSequences

Value is "Trailing" if the engine supports word sequence elements in CFGs, at the end of rule. Value should be "Anywhere;Trailing" if word sequence elements is supported anywhere in a rule. The presence of this attribute indicates the SR engine supports the Text-Buffer functionality. Applications can test for the presence of this attribute and if available, ISpRecoGrammar::SetWordSequenceData and ISpRecoGrammar::SetTextSelection may be used.

 

DictationInCFG

Value is "Trailing" if the engine supports dictation element in CFGs, as defined by SAPI at the end of rule. Value should be "Anywhere;Trailing" if dictation element is supported anywhere in a rule. Dictation elementisan element in a CFG that loads a SLM and returns the recognition result to the application. Refer to SAPI documentation for additional information.

 

There is also a Recognizers category in the HKCUS hive that stores the selected default Recognizer. This is done exactly as for Voices, as shown in Table 8. The CategoryID is:

HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\Recognizers\

 

6.3       Category: RecoProfiles

RecoProfiles (RP) is a user-specific, engine-specific data file in which an SR Engine stores the user-specific acoustic and language data. The RP can be thought of as a bag of information that only the engine knows about. The RP also stores the attributes in a few keys under the RP's key in the registry (this is current Speech Recognition tab of Speech properties in Control Panel).

There are two key reasons for a user to have multiple acoustic profiles:

1.   In a shared login case (for example, with a Win98 or Millennium home computer where the users typically press cancel to the login dialog box to enter the computer), multiple files allow two or more users to keep languages and acoustic data separate. In this case, the user will need to manually change the profile to the correct one in Speech properties in Control Panel before starting recognition (or an application may provide its own UI to do this).

2.   On a laptop, to offer the user the choice of having different acoustic profiles for different acoustic settings, such as home and office.

A typical RP token is located in the user hive in the following location in the registry

HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}

Initially, SAPI creates only one GUID, called Default User, for the RecoProfile. When the Recognizer is used for the first time, it should create a key under this GUID token of the Recognizer. For instance, if the default recognizer has the GUID XXX, the token HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}\XXXis created. RecoProfile stores all the files and settings under this key. These settings may include paths to the acoustic and language model files for the profile that are modified during speaker enrollment and subsequently during recognition. It may also contain additional data about the profile that may improve the recognizer accuracy, such as age, gender, microphone gain setting and so on.

Under the Recoprofile token, there is a key for the GUID of each engine that has a profile. When keeping the profile the same, a user switches the default engine (say to YYY) in Speech properties in Control Panel. The new engine, on instantiation (or termination of the session) should create thekey HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\RecoProfiles\Tokens\{ProfileGUID 1}\YYY                                                                                                                                                                                                                                                                                                FullPathToFil A

All engine-specific (YYY-specific for instance) settings for its RecoProfile should be stored under this key.

 

6.4       Category: AudioInput

 

The HKEY_LOCAL_MACHINE/SOFTWARE/MICROSOFT/Speech/AudioInput category contains token enumerators that enumerate all the AudioInput devices present on the computer. There is a token enumerator for each class of AudioInput Device. By default, SAPI 5 will have only a single token enumerator for the MMSys technology. This token enumerator will create an audio token for each AudioInput device (microphone) on the computer and return it when an application or engine calls SpEnumTokens or IenumSpObjectTokens.

 

The AudioInput category does not have standard attributes, and if multiple technologies are installed, an application needs to inspect each token to find the most suitable one.

 

Any additional AudioIn token enumerators must meet the requirements for a token enumerator laid out in Table 2. Example of the AudioInput category at:

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Recognizers\

 

Table 11 AudioInput Category

RegKey

ValueName

Comments

TokenEnums\MMSys

 

This is the category.

 

DefaultTokenID

This Default can point to a token enumerator or token.

AudioInput1

 

This is the key for the audio input device.

AudioInput1\Attributes

 

Attributes for the Token are under this key.

 

Technology

This is the technology, for example, "MMSys"

 

Vendor

This is the vendor name.

 

6.5        Category: AudioOutput

 

The HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioOutput category contains token enumerators that enumerate all the audio output devices present on the computer. As in the AudioInput category, there is a token enumerator for each technology of audio output Devices. By default, there will be a single token enumerator for MMsys. Under this, there will be entries for each audio output device installed on the computer.

RegKey

ValueName

Comments

TokenEnums\MMSys

 

This is the category.

 

DefaultTokenID

This Default can point to a token enumerator or token.

AudioOutput1

 

This is the key for the audio output device.

AudioOutput1\Attributes

 

Attributes for the Token are under this key.

 

NoSerializeAccess

Optional: Override serialization of multiple voices.

 

Technology

This is the technology, for example, "MMSys"

 

Vendor

This is the vendor name.

 

6.6       Category: AppLexicons

The AppLexicons category stores all the application lexicons SAPI knows about. As in other categories, the lexicons are located under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\Applexicons\Tokens. When called, the SpLexicon interface enumerates all the applexicons. Applexicons have no attributes, and therefore, there is no way to load only specific Applexicons. These keys will be created by applications to make their own lexicons available through SAPI.

 

6.7       Category: PhoneConverters

The ISpPhoneConverter interface enables the application to convert from the SAPI character phoneset to the ID phoneset. Phone Converter keys should go under HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\PhoneConverters,

SAPI has a single phoneconverter for each language. An engine can query for the phoneconverter whose language attribute matches the application's language of interest.

 

6.8       Category: UserLexicons

SAPI stores the user lexicon keys under the HKEY_CURRENT_USER\SOFTWARE\MICROSOFT\Speech\CurrentUserShortcut key. SAPI 5.3 and later support a new feature called "user shortcuts". This is a UserLexicon that maps a spoken term to an expanded printed term. For example, a user may define "my home address" as "12345 NE 678 Av NE Someplace WA 98888" to save them having to dictate and correct the full address. SPCURRENT_USER_SHORTCUT_TOKEN_ID is the token ID used to access this lexicon.

 

 

7         Index of Tables

Table 1: Parts of a Token in the Registry. 3

Table 2 Parts of the AudioInput token enumerator9

Table 3 Common Helper Functions. 10

Table 4 Engine Developer Helper Functions. 11

Table 5 Query Operators. 12

Table 6 Voices installed on a computer. 13

Table 7 Scoring of tokens matching optional criteria. 13

Table 8 Voice Registry and Attributes. 18

Table 9: Voices - User Registry Settings. 19

Table 10 Sample Entries of a Recognizer token. 20

Table 11: AudioInput Category. 23