Jaa


Using peml with prompts

When I first joined the Speech Server group I was a bit daunted by all of the different markup languages that we have.  For instance, in Speech Server 2004 we have the following.

SALT - allows you to create Speech web applications.  More info on this can be found at the Salt Forum's web site.

SML - Semantic Markup Language.  This is the markup used to return the results of a grammar recognition.

Grammar XML - Used to describe a grammar.

SSML - Speech Synthesis Markup Language. Used to change how speech output is modified.

PEML - Prompt Engine Markup Language.  Used to give hints to the Prompt Engine on how to handle a prompt.

These last two (SSML and PEML) are perhaps the least known but can accomplish some interesting things for you.  In today's post, I will cover PEML.  In a later post I will cover SSML.  If anyone would like me to cover Grammar XML, SML, or SALT in future posts please let me know.

Using the Speech SDK, anyone can use the Recording Editing and Design Studio to create a prompt database.  I detailed creation of prompts and some features of this in previous posts.  In order to use these prompts in a running speech application, you must compile the prompt database to a .prompts file.  You do this by selecting 'build' from the menu in Visual Studio as you would with any other VS project.

If you are curious what the .prompts file looks like, just open in with a structured storage viewer.  For those unfamiliar with structured storage files, they are basically 'directories in a file'.  Within the file you will find 'directories' called storages and 'files' called streams.  When a .prompts file is created, each wave file pertaining to an extraction is placed in its own stream.  Therefore a single wave file of a transcription will be broken up into smaller wave files containing the extractions.  Other information is also present in the .prompts file and if you are very curious you can find structured storage file readers on the web that will let you examine the file though the truth is it's not very interesting.

When a prompt is played within a speech web application, markup called PEML is passed to the prompt engine.  Much of this is done for you in the SDK when you select the prompt database corresponding to an application.  However, there are cases when you may want to write your own PEML.

All PEML is enclosed with the peml:prompt_output tag.  This tag has no attributes and simply functions as a root element.  PEML contains six tags, five of which I will discuss today and one I will discuss in more depth in a later post.

The most common PEML element is the peml:database element.  An example of this is.

 <peml:prompt_output>
    <peml:database fname="https://www.mycorp.com/database.prompts" idset="corp" />
 </peml:prompt_output>

 

This element tells the prompt engine where to look when searching for extractions.  You can specify multiple databases with multiple peml:database tags.  The idset attribute simply specifies that all IDs in the database begin with the specified text.

 

Say you wish to play a prompt where part of the prompt exists in an extraction but another part quite probably doesn’t exist.  For example, say your application retrieves the titles of books in a recent order and will play the following prompt.

           

            You ordered the book “Igloos” .

 

You do have an extraction for “You ordered the book” but you do not have a prompt for “Igloos”.  To make sure that the entire sentence is not pronounced using TTS use.

 

            You ordered the book <peml:div /> Igloos.

 

This will break up the prompt into two extractions.  Of course, if you have an extraction for “Igloos” it will still play.

 

Another capability of PEML is to play a prompt with the specified ID.  For example, say you wish to play the prompt with the ID 7.

 

            <peml:id id=”7” />

Why would you want to do this?  One example may be a multilingual application.  You could have a database for each language, with equivalent phrases having the same id.  You can then use PEML to dynamically change the prompt database and play the appropriate prompts by Id.

 

Another use of PEML is to force something to play in TTS.  This can help performance of an application if you are sure a phrase will not be prerecorded.

 

            <peml:tts>Yippety doo dah</peml:tts>

 

Finally, suppose have an application where the prompts speaker changes depending on the individual.  For instance, after verification if the customer is male recordings made by Cindy Crawford are played.  If the customer is female, recordings made by George Clooney are played.  The text of the prompts are exactly the same and both sets of recordings exist in the same prompt database.

You can accomplish this by assigning a tag to all of Cindy’s extractions named “cindy” and a tag for all of George’s extractions named “george”.  Then, to play a prompt using Cindy’s recordings.

 

            <peml:withtag tag=”cindy”>Would you like to purchase extra minutes for your phone card?</peml:withtag>

 

The last element of PEML is the peml:rule tag.  This tag can accomplish some interesting things and will be dealt with in more depth in a later post.

Comments