Compartilhar via


Speech Application Design Considerations

  Microsoft Speech Technologies Homepage

This topic describes design considerations for both voice-only and multimodal applications.

Voice-only Mode Design Considerations

Develop voice-only applications on .aspx pages, knowing that the voice-only user will never see the visible interface that Internet Explorer presents. A voice-only user speaks and presses keys on a telephone while connected to a speech-enabled .aspx page running on Telephony Application Services. The .aspx page and the controls contained on it provide an object model for the voice-only application, but there is no graphical user interface (GUI).

Speech Debugging Console provides special features for debugging voice-only applications, including the ability to send voice or text to an application, and to send dual tone multi-frequency (DTMF) data.

Appropriate Time-outs

There are a number of time-out settings, such as the two listed below. Be sure to use appropriate time-out settings

  • InitialTimeout, which specifies the maximum allowable time, in milliseconds, between the start of recognition and the detection of speech.
  • InterDigitTimeout, which specifies the maximum allowable time, in milliseconds, between keypresses in a dual tone multi-frequency (DTMF) application.

Disguising System Delays

Remember the delays involved with server postback, and visualize the user with a cell phone on a bus or subway, waiting for the system to respond. Consider playing informational prompts while the application accesses data. A multimodal application need not mirror a voice-only application entirely in this respect.

Grammar Design

Grammar design is a major factor, particularly when dealing with responses to open-ended questions such as "What can I do for you today?" It is important to strike the appropriate balance between dialogue flexibility and grammar complexity.

Prompts

In a voice-only application, the only communication that the user receives from the application is audible prompts. There is a heavy burden on the prompts. Users must understand clearly what information the application is requesting, and know where they are in the dialogue flow.

Consider the following factors when creating effective prompts.

Factor Description
Prompt recording quality Prompts are often built dynamically from extractions, rather than simply playing a .wav file end-to-end. Create extractions that provide for smoother, more natural-sounding concatenations.

Deliver prompts using either custom voice recordings or text-to-speech (TTS). TTS is not the preferred choice, but in cases where the prompt content cannot be recorded ahead of time (for example where an application is reading e-mail text) it is the only option. Use Speech Prompt Editor to edit and manage prompts, validate prompts, and optionally record prompts. Validate prompt projects to ensure that speech output does not drop out of recorded speech into TTS when prompt extractions are not present in the prompt database.
Necessary information Prompts should contain only the information necessary to complete the task. Create concise prompts that are not overly verbose.
Reflect human interaction A speech application should not say anything that a person would not say to another person.
Ambiguous language Conversations between two people typically contain ambiguous language. This ambiguity is overcome by visual cues that each person gives to provide context to a conversation. Because a voice-only application contains no visual cues, there is no room for ambiguity. Avoid creating ambiguous questions or statements.
Familiar terminology Use terms and phrases familiar to users. If jargon is used, make it jargon that users will understand.
Information sequence Determine the most important information a user needs to receive from a prompt, and provide that information early in the prompt. Provide any supporting information after the most important information. Using this process, users can decide right away whether the prompt contains information that is relevant to them. If it does not, users can choose not to listen to the entire prompt.
User's knowledge level Before designing a voice-only application, try to estimate what the average user's level of knowledge will be at each stage of the application. While designing the application, remember that users will learn more about the application the more times they use it.
Application purpose Based on a user's estimated knowledge level, use informational prompts at the beginning of an application or at specific points in the application to clarify the purpose of the application and how it will work. The application should explain its role before asking users questions. These prompts could take the form of a simple welcome message, such as "Welcome to Jimmy's Pizza Ordering Service," or a more formal introduction to a procedure, such as "I will now ask you what you want you want to order, and then I will tell you how much your order will cost."
Application structure Understand the structure of an application before designing prompts. Without an understanding of application structure, prompts may need to be rewritten if the application's structure changes.

Dialogue Flow

Because a voice-only application interacts with users entirely without visual cues, an effective dialogue flow is the key component to a successful interaction between the application and its users. The dialogue must be intuitive and natural enough to simulate two humans conversing. It must also provide a user with enough context and supporting information to understand what to do at any point in the application. Additional factors are listed in the following table.

Factor Description
Dialog design Try to minimize the number of dialog turns required to complete a transaction. Use Short Time-out Confirmation (STC) and Implicit Confirmation (IC) strategies to cut down on the number of dialogue turns.
Good escape strategies Plan a good escape strategy to ensure that users do not get stuck in loops. Visualize a user trying over and over again, with mounting irritation, to get a particular result from an application. It is better to break the connection and let the user start again.
Dialogue balance Strike a careful balance between allowing users to say whatever they want against constraining users to a limited set of choices. In general, limit user choices. The two contrasting dialogue styles are system-initiative and mixed initiative. Using the system-initiative dialogue style, the application asks questions, and accepts only specific answers. Using mixed initiative, the application asks questions, but users can provide input in addition to answering the specific question that was asked. For example:
  Application: "Which city do you want to fly to?" 
User: "Portland, Oregon, on Monday".

Global Commands, Silence and Mumble

Provide options that enable users to ask for help, exit an application, or issue any number of other commands that are not part of a specific dialogue. The most common global commands are Help and Repeat. The interesting thing about handling a Repeat command is that the application should only repeat the relevant information. For example, if the Repeat command follows a mumble prompt the application should not respond by repeating the mumble prompt. Other common commands include Cancel and Start Over. For any application this set of commands must be included among the commands that can occur at any QA control. Use Speech Command controls to add this functionality to an application.

Any question from an application to a user can result in silence or mumbling, a request to repeat the question, a request for help, or some other request. Provide users with helpful messages and some indication of the reason for the message.

  User: Mumble
Application: "I'm sorry, I didn't understand your answer."

User: Silence
Application: "I'm sorry, I didn't hear your answer."

User: "Help"
Application: "Here is some help ..."

Orientation Messages

Pay particular attention to transition points in the application and ensure that the user is receiving appropriate orientation messages at those points. For example:

  User: "Main menu please"

Application: "Returning to the Main Menu"
Application: "What information would you like?"

Multimodal Mode Design Considerations

In a multimodal application, the GUI guides the user through the application. Users can enter data by speaking to the application, typing data into text boxes, or selecting items from drop-down lists or other controls. When a user speaks to an application, recognized text typically binds to elements on a Web page, such as text boxes or drop-down lists, or to JScript variables. Use Listen controls to enable binding of recognized text to Web page elements, and to specify event handlers.

Multimodal applications typically do not contain prompts because user interaction is driven by visual events. However, inline prompts can repeat recognized text and additional supporting information to users to confirm that the application has the correct data before moving on. Use Prompt controls to specify inline prompts and to specify event handlers.

Internet Explorer is the platform for multimodal applications, which may lead to the misunderstanding that a GUI speaking to the user is an intended form of interaction. The intent for multimodal applications is the reverse: the user speaks to the GUI.

For Further Information

The following table provides links to other topics describing general speech application design issues.

To See
Get more information on types of speech applications. Types of Systems
Get more information on the design process. Design Process