Condividi tramite


Call Flow

  Microsoft Speech Technologies Homepage

In a speech system, information is presented serially, placing requirements on the user's memory. Different types of dialogue interactions—directed or natural—provide different ways to facilitate the user's interaction with the system. The choice of a dialogue interaction method determines the design of the call flow for the application.

The Linear Nature of a Voice User Interface (VUI)

The sequential nature of information presentation in a speech or VUI system is fundamentally different from the parallel presentation of information in a GUI. With speech systems, the user must absorb information serially, one element at a time, rather than scanning on multiple data fields and output from a GUI. This process places additional demands on the user's memory, and on the user's ability to learn the interface quickly in order to accomplish the task.

Designers should consider this process when creating menus or lists, and when offering options to the user. Overtaxing the user's memory with too many information items will lead to a breakdown of the interaction.

Human Memory Constraints

On average, a person's short-term memory holds between five and nine items (seven—plus or minus two) for about 20 seconds. To increase that amount, information needs to be "chunked." The easiest way to remember information is in three to four chunks, with three to four items per each chunk. A standard phone number with area code is a good example of chunked information: 415-555-4007. Also, a person can associate a meaning with information or otherwise semantically code the information to aid in memorization. For example, the mnemonic ROY G. BIV, which sounds like a person's name, is an easy way to remember the colors in the spectrum—red, orange, yellow, green, blue, indigo and violet.

The part of the human brain that transiently holds chunks of information and solves problems also supports speaking and listening. Speaking consumes precious cognitive resources. It is difficult to solve problems while speaking at the same time. Designers need to understand the limitations on a user's ability to respond to complex questions.

Human memory has another constraint in the context of speech systems: People find it more difficult to remember a message spoken by a speech synthesizer than a human voice. Using additional cognitive capacity appears to interrupt the transfer of information from short-term to long-term memory.

Design Implications

Speech systems make demands on users. Users may be required to listen, retain, judge, decide and respond, often in less than optimal surroundings. Designers should not exceed memory limitations, and know that users will have trouble remembering lengthy complicated messages. Instead, designers should explore efficient ways of chunking and semantically coding information. See Menus and Lists.

Directed Dialogue

Directed dialogue designs generally present users with specific lists of available commands. They typically prompt users with questions designed to move them, turn by turn, through a dialogue. Generally, systems designed for directed dialogue do not accommodate Natural Language (NL) input and are often limited to simple turn-taking. In turn-taking, the application asks for information and the user responds with the requested information.

A note about script examples: Caller commands spoken by the system are in capital letters—this is a useful convention for representing grammars. When capital letters appear in a script, they also cue the voice actor to add subtle inflections that emphasize the commands.

SYSTEM: Say SCHEDULE AN APPOINTMENT or LOOK UP MY APPOINTMENTS.
CALLER: Schedule an appointment
SYSTEM: Please pick a date for this appointment.
CALLER: Monday
SYSTEM: What time would you like to start?
CALLER: Ten A.M.

However, natural dialogue systems are more likely to prompt users with open-ended questions that encourage complex responses from users, called mixed initiatives. These systems can handle a greater degree of variation within a caller's utterances.

SYSTEM: ...and if you'd like to put something in your calendar, just tell me about the appointment. (For example, you can say: "Schedule an appointment for Friday from nine until ten thirty." Or "Book a meeting for next Tuesday beginning at four in the afternoon." What can I do for you?)
CALLER: Set up an appointment for August 2nd at one forty five.
SYSTEM: All right. How much time do you want to block out?
CALLER: An hour.
SYSTEM: What shall we call this?
CALLER: Let's call it a team meeting.

Choice of Interaction

The nature of the task and type of user will largely determine the designer's approach.

Systems that cater to new users with few repeat callers might perform best with simple turn-taking. Other systems may require a design for repeat callers. If a task involves information that is suitable for mixed initiative utterances such as "Buy 100 shares of Microsoft at market" then perhaps a more natural dialogue style is advisable.

If the designer chooses natural dialogue, it is best to continue with this approach and only use directed dialogue after repeatedly encouraging natural responses. First-level mumble and time-out prompts should reinforce the natural interaction style with further example phrases. Only as the caller reaches the second-level error-handling should the system revert to explicit direction of the conversation, that is, directed dialogue.

See Also

Dialogue Organization | Designing Dialogue Flow