Introducing the Core API
The Core API is what made me so excited about the launch of Speech Server several months ago. Personally, I found it difficult to use either SALT or our speech controls to write an app. It just did not seem natural to me - when I look at graphical elements on a page I think Windows Forms, not a dialog flow.
SALT is great for multimodal apps, which are no longer supported in Speech Server, but I still had a hard time thinking of dialog flow as a markup language. The problem is markup languages have a very hard time with branching and looping. Languages like HTML are more geared towards displaying something on the screen, not in being used as a programming language. VXML is a bit simpler to understand, but I still find using it awkward.
The natural solution is to write applications in code. The best choice for this is to write it in .NET code, so you can leverage the tremendous capabilities of this framework. The Core API in Speech Server 2007 allow you to do exactly that.
In typical fashion, I will introduce the Core API using a Hello World app. In future blogs, I will discuss in more detail the features of the Core API but for now let's get our hands dirty with some code.
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Threading;
using System.Diagnostics;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
using Microsoft.SpeechServer.Recognition;
using Microsoft.SpeechServer.Synthesis;
using Microsoft.SpeechServer.Recognition.SrgsGrammar;
using Microsoft.SpeechServer.Dialog;
using System.Globalization;
using Microsoft.SpeechServer;
namespace HelloWorld
{
/// <summary>
/// Hello World app
/// </summary>
public class Class1 : IHostedSpeechApplication
{
private IApplicationHost Host;
/// <summary>
/// Set the lifetime of te application to match that of a call
/// </summary>
public bool IsReusable
{
get { return false; }
}
/// <summary>
/// Startup method for app
/// </summary>
/// <param name="host">The host interface</param>
void IHostedSpeechApplication.Start(IApplicationHost host)
{
Host = host;
Host.TelephonySession.AcceptCompleted += new EventHandler<AsyncCompletedEventArgs>(TelephonySession_AcceptCompleted);
Host.TelephonySession.AcceptAsync();
}
/// <summary>
/// Called when the call has been accepted
/// </summary>
/// <param name="sender">The sender of the events</param>
/// <param name="e">Information about the event</param>
void TelephonySession_AcceptCompleted(object sender, AsyncCompletedEventArgs e)
{
if (e.Error != null)
{
Host.TelephonySession.LoggingManager.LogApplicationError("An unexpected error occurred in TelephonySession_AcceptCompleted:" + e.Error.ToString());
Host.TelephonySession.Close();
}
else
{
Host.TelephonySession.Synthesizer.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(Synthesizer_SpeakCompleted);
Host.TelephonySession.Synthesizer.SpeakAsync("Hello World", SynthesisTextFormat.PlainText);
}
}
/// <summary>
/// Called if an exception occurs in your code
/// </summary>
/// <param name="exn">The exception that occurred</param>
public void OnUnhandledException(Exception exn)
{
Host.TelephonySession.LoggingManager.LogApplicationError(555, "An unhandled exception occurred during test case VariableCheck error: " + exn.ToString());
}
/// <summary>
/// Called to indicate that the server is shutting down
/// </summary>
/// <param name="immediate">Indicates this is a final warning and the app should stop immediately</param>
public void Stop(bool immediate)
{
}
/// <summary>
/// Called when the prompt has finished playing
/// </summary>
/// <param name="sender">The object that sent the event</param>
/// <param name="e">Information about the event</param>
private void Synthesizer_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
{
if (e.Error != null)
{
Host.TelephonySession.LoggingManager.LogApplicationError(100, "Error in synthesizer prompt: " + e.Error.ToString());
}
Host.TelephonySession.Close();
}
}
}
Wow! Some of you are probably thinking that is quite long for a Hello World app! A lot of this is code required for any Core API app and some of this you do not need to concern yourself with much right now.
The key thing to remember for the Core API is it is an asynchronous API. Therefore you will often hook up to events and call asynchronous methods. This can make the API tricky at times, and I will cover some of the finer points in later blogs. So lets get started and look at the parts of the app.
First, two pieces you do not need to think about much right now - Stop and IsReusable. For now just trust that they need to be in the app but most of the time they do not need to be changed.
The most important method in this app is IHostedSpeechApplication.Start. All speech applications in Microsoft Speech Server 2007 implement the IHostedSpeechApplication interface. This interface contains the aforementioned Stop method and IsReusable property, the OnUnhandledException method (discussed shortly) but its most important method is Start. The runtime will call this method when your application is invoked, passing in an IApplicationHost object. In general, you should always save the IApplicationHost object because you will need it to communicate with the runtime.
In general, you should do one of two things in your implementation of Start.
1) Create an outbound connection
2) Accept an incoming call
Which you do of course depends on whether this is an inbound (someone calls the app) or an outbound (the app calls someone) application. I will cover outbound calls in a future blog. In this case, it is an inbound application so we need to accept it. We do this using an asynchronous call.
I will now degress slightly and talk about the two ways errors are handled in a speech application.
1) If you created the error or an unhandled error was generated by the runtime, OnUnhandledException is called
2) If an error occurs related to the asynchronous request you just placed, it is contained in the Error object of the EventArgs passed to the event handler.
For the first case, we have implemented the OnUnhandledException method and made sure to log this error. When debugging, you should also make sure to place a breakpoint here or your app may appear to stop with no reason.
For the second case, we check the e.Error object in each event handler. This should be done religiously if you would like your app to be reliable. If you do not check the e.Error object, following code could make an assumption about the state of the runtime that is not valid. In this case, I just log the error and close the session but in some cases you may wish to gracefully communicate to the caller if possible.
In the AcceptAsync event handler I now play the Hello World prompt. This is accomplished through the Synthesizer object of the TelephonySession object. The ITelephonySession interface is used to communicate with an individual telephony session and a telephony session object already created is passed to you in the IApplicationHost object passed in Start. The ISynthesizer interface is used for outputting speech to the user. I will discuss this interface in more detail later, but for now know that the SpeakAsync method can be used to speak some simple text using TTS (Text To Speech).
Finally, in the SpeakCompleted event handler I check for an error again and close the telephony session. This has the effect of hanging up on the user.
Comments
- Anonymous
April 06, 2006
Thanks Joe! I think this is the first public view I have seen of the core API. Thanks for sharing.
I am interested in seeing how people feel about your statement, "...multimodal apps, which are no longer supported in Speech Server..." - Anonymous
April 06, 2006
Yes, I thought the same thing. What do you mean Multi-Modal apps are no longer supported? - Anonymous
April 07, 2006
You will not be able to create a multimodal app using the Core API. The controls used for multimodal apps for SALT applications (such as Prompt) have been removed. Unfortunately I cannot comment much on the business decision for doing this, but suffice to say that multimodal apps are a very small portion of the marketplace. Much more common are telephony IVR systems.