Freigeben über


Introducing the Synthesizer

In general in telephony apps, you will interact in two ways with the caller.

1) You will speak to the user, giving them information or instructions

2) You will ask the user for more information

Of course, there are other methods of interaction such as transferring the user, but for now I will focus on the basics.  Of these two ways to interact, the second is more complicated and will be dealt with in a series of blogs later.  Today we will discuss speaking to the user.

The object responsible for speaking to the user is the ISynthesizer object.  This object is conveniently placed as a member of the ITelephonySession object, which as you may remember is a member of the IApplicationHost object passed to IHostedSpeechApplication.Start.  This makes it very easy to access so once the call is accepted or opened, you can begin speaking to the user.

The main method for speaking to the user is the SpeakAsync method.  This has several overloads, several of which are not entirely straightforward.

In general, the most common overload is

Host.TelephonySession.Synthesizer.SpeakAsync(

"text to play", SynthesisTextFormat);

This method takes the text to play, followed by a SynthesisTextFormat enumeration that specifies how the text should be interpreted. There are several possible values for this.

SynthesisTextFormat.PlainText - Use this if you know that the entire prompt should be played in TTS. For those who are not familiar with TTS it stands for "Text To Speech" and means the engine will try to pronounce the text itself. In most telephony applications, you will want to prerecord many of the prompts so they are easier to understand and more "friendly" to the caller.

SynthesisTextFormat.Ssml - Use this to pass SSML to the engine, which will be passed to the prompt engine. This will allow you to use prompts that have been prerecorded in the Recording and Editing Studio and built into .prompts files. An example simple SSML would be:

<peml:prompt_output xmlns:peml=\"https://schemas.microsoft.com/Speech/2003/03/PromptEngine\">

<peml:database fname="databaselocation"/>

       hello world

</peml:prompt_output>

 Here, we specify which database (.prompts file) to use with the peml:database tag and then the string to play.  Note that spacing here is readability only and is not necessary in code.  If the prompt is contained within an extraction in the prompt database, the corresponding wave file will be played.  If it is not contained within the prompt database, the prompt will be played as TTS.  One pain point when working with this is the string must match the extraction exactly - including spacing and punctuation.  This can sometimes make it painful to get a prompt to play.

SynthesisTextFormat.WaveAudio - This indicates that we have passed in the path to a wave file.  Note that the type of the wave file (ulaw or alaw) must match the audio type used by the synthesizer.

So we have now covered how to play a basic prompt.  There is of course a lot more that you can do with prompts, which we will cover shortly.

Comments

  • Anonymous
    May 05, 2006
    This sounds interesting but why would I want to write at this low level when I have the other methods available? What do I get for doing this?
  • Anonymous
    May 05, 2006
    This is part of the reason why I have been silent for so long.  There is of course an easier way to do alot of this but it is not public yet.