Compartilhar via


Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Phrase Generator Input and Output File Format

Phrase Generator accepts XML-based grammar files as input and generates a list of phrases in an XML file with the “.utt” (utterance) extension. You can generate phrases for either:

  • The root rule of the input file and all the rules referenced by the root rule.

  • A rule that you specify that is not a root rule, and all the rules that it references.

Input File Format

Use grammar documents in XML format that conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0 as input for Phrase Generator, for example MyGrammar.grxml. It is an accepted convention to use the “.grxml” file extension for XML-based grammar documents that conform to the SRGS specification.

Required Attributes

A valid XML-format grammar document consists of a legal header followed by a body consisting of a set of legal rule definitions. A legal header in a grammar document must include the xml declaration element and may include an optional DOCTYPE declaration element. These are followed in the header by the root grammar element. Grammar files used as input to the PhraseGenerator command must contain the following required attributes in the root grammar element or the tool will generate an error:

Attribute

Description

Example

version

Attribute of the grammar element. The version of the specification implemented by the grammar.

version="1.0"

A grammar that complies with the Speech Recognition Grammar Specification (SRGS) Version 1.0 must declare the version to be "1.0".

xmlns

Attribute of the grammar element. The URI of the namespace for the grammar.

xmlns="http://www.w3.org/2001/06/grammar"

xml:lang

Attribute of the grammar element. The primary language contained by the document and optionally a country or other variation.

xml:lang="en-US"

The xml:lang attribute is required if <grammar mode="voice"> or if the mode attribute is omitted, from the grammar element, in which case the value defaults to "voice". The xml:lang attribute is not required if <grammar mode="dtmf">.

Note

Grammar files in the Augmented Backus-Naur Form (ABNF) format are not supported.

Example

The following is an example of a valid input grammar document that contains only the minimum required elements and attributes in the document’s header, and a simple rule in the body of the document.

<?xml version="1.0"?>   
<grammar version="1.0"      
xmlns="http://www.w3.org/2001/06/grammar"      
xml:lang="en-US">      

   <rule id="main">         
      <one-of>            
         <item> hello </item>            
         <item> world </item>         
      </one-of>      
   </rule> 
  
</grammar>

The document header ends and the body of the grammar document begins with the first Rule element. Grammar files used as input for Phrase Generator can contain optional elements and attributes in the document header, see SRGS Grammar XML Reference (Microsoft.Speech). For more information about elements and attributes in SRGS grammars, see Speech Recognition Grammar Specification Version 1.0.

Note

Depending on system and memory constraints, the tool may be unable to process grammars that contain in excess of 100,000 phrases.

Output File Format

PhraseGenerator.exe outputs a list of phrases into an XML document. Phrases are listed in the order processed, though their order may not correspond to the order in which rules are listed in the input file, or in externally referenced grammar documents. The output utterance file contains the following XML elements:

Element

Description

Scenario

The header element for the utterance document, which takes the attribute xml:space. The value for xml:space is always preserve, which preserves white space in the output phrases.

Seed

Contains an integer value that is used to generate phrases when the SampleByWeight method is used. If specified by the user as an option value, it appears in the output file. If not specified by the user, a random value will be assigned by the tool in the output.

Utterance

The parent element for each phrase.

TranscriptText

The semantic content of the phrase, for example "Hello world".

Weight

For any method of generation, a floating point value from 0 to 1 that corresponds to the phrase weight (combining any and all rule weights) of the phrase generated. Notation always follows the W3C xsd:decimal type, specifying “.” as the decimal separator regardless of locale.

Example 1

This example shows the output that Phrase Generator produces from a grammar file (PearsGrapes.grxml) that contains a reference to a rule in another grammar file (ApplesOranges.grxml). Phrase generator outputs phrases from rules in both grammar files. The contents of the output utterance file is shown below the contents of the input grammar files.

Below are the contents of the input grammar file PearsGrapes.grxml. This grammar file contains the root rule "main", which references the rule "Fruit" within the grammar and the rule "OtherFruit" in the separate grammar ApplesOranges.grxml.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd">

<grammar version="1.0"
   xmlns="http://www.w3.org/2001/06/grammar"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
   http://www.w3.org/TR/speech-grammar/grammar.xsd"
   xml:lang="en-US" mode="voice" root="main">

   <!-- rule declarations -->
   <rule id="main" scope="public">
      <one-of>
         <item><ruleref uri="#Fruit"/></item>
         <item><ruleref uri="ApplesOranges.grxml#OtherFruit"/></item>
      </one-of>
         <item> are my favorite fruit </item>
   </rule>

   <rule id="Fruit" scope="public">
      <example> Pears </example>
      <example> Grapes </example>
      <one-of>
         <item> Pears </item>
         <item> Grapes </item>
      </one-of>
   </rule> 
</grammar>

Below are the contents of the grammar file ApplesOranges.grxml. Note that this grammar file does not have a root rule and that its rule "OtherFruit" is referenced explicitly from PearsGrapes.grxml.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN" "http://www.w3.org/TR/speech-grammar/grammar.dtd">

<grammar version="1.0"
   xmlns="http://www.w3.org/2001/06/grammar"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
   http://www.w3.org/TR/speech-grammar/grammar.xsd"
   xml:lang="en-US" mode="voice">

   <rule id="OtherFruit" scope="public">
      <example> Apples </example>
      <example> Oranges </example>
      <one-of>
         <item> Apples </item>
         <item> Oranges </item>
      </one-of>
   </rule>
</grammar>

Now that the input grammars are created, we can run Phrase Generator to output the list of phrases for all the rules referenced by the root rule of the file specified at the /In option to a new file that Phrase Generator creates called MyFavoriteFruits.utt.

The following command-line entry creates the list of all phrases that can be generated from the grammars PearsGrapes.grxml and ApplesOranges.grxml, and writes them to the new file MyFavoriteFruits.utt:

PhraseGenerator /In PearsGrapes.grxml /Out MyFavoriteFruits.utt

Below are the contents written by Phrase Generator into the file MyFavoriteFruits.utt, using the grammars shown above as input.

<?xml version="1.0" encoding="utf-8"?>
<Scenario xml:space="preserve">
  <Utterance>
    <TranscriptText>Oranges are my favorite fruit </TranscriptText>
    <Weight>0.25</Weight>
  </Utterance>
  <Utterance>
    <TranscriptText>Apples are my favorite fruit </TranscriptText>
    <Weight>0.25</Weight>
  </Utterance>
  <Utterance>
    <TranscriptText>Grapes are my favorite fruit </TranscriptText>
    <Weight>0.25</Weight>
  </Utterance>
  <Utterance>
    <TranscriptText>Pears are my favorite fruit </TranscriptText>
    <Weight>0.25</Weight>
  </Utterance>
</Scenario>

Example 2 - SampleByWeight

This example shows how to use Phrase Generator with the SampleByWeight method. We will use the following command and a hypothetical confirmation grammar (not shown):

PhraseGenerator.exe -In YesNo.grxml -Out MyUtterances.utt -MaxPhrases 3 –Method SampleByWeight

The command line does not specify a /Seed option. Therefore, Phrase Generator supplies a random value, which displays in the output utterance file, as shown below:

<?xml version="1.0" encoding="utf-8"?>
<Scenario xml:space="preserve">
  <Seed>1713698674</Seed>
  <Utterance>
    <TranscriptText>yeah</TranscriptText>
    <Weight>0.032255451644366766</Weight>
  </Utterance>
  <Utterance>
    <TranscriptText>no</TranscriptText>
    <Weight>0.32255452389424816</Weight>
  </Utterance>
  <Utterance>
    <TranscriptText>yes</TranscriptText>
    <Weight>0.64511906136866892</Weight>
  </Utterance>
</Scenario>

The /Seed value produced by Phrase Generator represents a specific set of results from a specific input file. Re-running the command with a /Seed option value of “4359834”, shown below, will produce the exact same set of phrases as above in the output.

PhraseGenerator.exe -In YesNo.grxml -Out MyUtterances.utt -rule MyRule -MaxPhrases 3 –Method SampleByWeight –Seed 4359834

You can use the /Seed value to determine whether or not a grammar and the weight values specified in its Item elements have been changed. Unless the input file has changed, the same command with the same /Seed value will generate the same set of phrases and phrase weights.