Speech Synthesis

Article
01/20/2015

The System.Speech.Synthesis namespace contains classes that allow you to initialize and configure a speech synthesis engine, create prompts, generate speech, respond to events, and modify voice characteristics. Speech synthesis is often referred to as text-to-speech or TTS.

Create TTS Content (Prompts)

The content that a TTS engine speaks is called a prompt. Creating a prompt can be as simple typing a string. See Speak the Contents of a String.

For greater control over speech output, you can create prompts programmatically using the methods of the PromptBuilder class to assemble content for prompts from text, Speech Synthesis Markup Language (SSML), files containing text or SSML markup, and prerecorded audio files. PromptBuilder also allows you to select a speaking voice and to control attributes of the voice such as rate and volume. See Construct and Speak a Simple Prompt and Construct a Complex Prompt for more information and examples.

You can also author content using SSML-compliant XML, which provides a full range of content authoring features and also allows you to select and control speaking voices. See Speech Synthesis Markup Language Reference for a guide to SSML markup.

You can append SSML files to a PromptBuilder object for playback by the SpeechSynthesizer. See Use SSML to Control Synthesized Speech.

Initialize and Manage the Speech Synthesizer

The SpeechSynthesizer class provides access to the functionality of a TTS engine in Windows Vista, Windows 7, and in Windows Server 2008. Using the SpeechSynthesizer class, you can select a speaking voice, specify the output for generated speech, create handlers for events that the speech synthesizer generates, and start, pause, and resume speech generation. See Initialize and Manage the Speech Synthesizer.

A voice enables the speech synthesizer to generate speech in a particular language, and may have other attributes such as gender or age. A SpeechSynthesizer instance can load any voice that is installed on the system and use it to generate speech. Windows Vista, Windows 7, and Windows Server 2008 include one or more installed TTS voices.

Generate Speech

Using methods on the SpeechSynthesizer class, you can generate speech as either a synchronous or an asynchronous operation from text, SSML markup, files containing text or SSML markup, and prerecorded audio files.

Respond to Events

When generating synthesized speech, the SpeechSynthesizer raises events that inform a speech application about the beginning and end of the speaking of a prompt, the progress of a speak operation, and details about specific features encountered in a prompt. EventArgs classes provide notification and information about events raised and allow you to write handlers that respond to events as they occur. See Use Speech Synthesis Events for more information and examples.

Control Voice Characteristics

To control the characteristics of speech output, you can select a voice with specific attributes such as language or gender, modify properties of the SpeechSynthesizer such as rate and volume, or adding instructions either in prompt content or in separate lexicon files that guide the pronunciation of specified words or phrases. See Control Voice Attributes for more information and examples.

Partager via