Choose an Azure AI speech recognition and generation technology

Article
03/20/2025

Azure AI services help workload designers and developers to create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and prebuilt and customizable APIs and models.

This article covers Azure AI services that offer speech recognition and generation capabilities such as speech-to-text and text-to-speech conversions, audio translation, speaker recognition, as well as reading support for people with learning differences.

Note

To gather insights on terms or phrases or get detailed contextual analysis of spoken or written language, see Choose an Azure AI targeted language processing technology.

Services

The following Azure AI services can provide speech recognition and generation capabilities for your workload.

Azure AI Speech provides natural language processing for text analysis.
- Use Speech service when you need to transcribe or translate spoken speech, identify speakers in a conversation. You can also use the service as a lower cost alternative for natural sounding speech generation to the higher quality Whisper in the OpenAI models.
- Don't use Speech service for chat, content summarization, moderation, or guiding users through scripts. Use other models for those things instead.
Immersive Reader is a tool that implements proven techniques to improve reading comprehension for emerging readers, language learners, and people with learning differences.
- Use Immersive Reader to provide an improved readability experience tailored for language learners or people with learning differences.
- Don't use Immersive Reader for traditional text to speech use cases.

Azure AI Speech

Azure AI Speech provides speech to text and text to speech capabilities with a Speech resource. You can transcribe speech to text with high accuracy, produce natural-sounding text to speech voices, translate spoken audio, and use speaker recognition during conversations. Create custom voices, add specific words to your base vocabulary, or build your own models. Run Speech anywhere, in the cloud or at the edge in containers.

Speech is available for many languages and regions.

Capabilities

The following table provides a list of capabilities available in Azure AI Speech service.

Capability	Description
Batch transcription	Transcribe a large amount of audio data in storage. Both the Speech to text REST API and Speech CLI support batch transcription.
Intent recognition	An intent is something the user wants to do: book a flight, check the weather, or make a call. With intent recognition, your applications, tools, and devices can determine what the user wants to initiate or do based on options. You define user intent in the intent recognizer or conversational language understanding (CLU) model.
Pronunciation assessment	Evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio.
Speaker recognition	Speaker recognition can help determine who is speaking in an audio clip. The service can verify and identify speakers by their unique voice characteristics, by using voice biometry.
Speech-to-text	Converts audio streams to text in real time or in batch.
Text-to-speech	Enables your applications, tools, or devices to convert text into human-like synthesized speech.
Speech translation	Provides multi-language speech-to-speech and speech-to-text translation of audio streams.
Video translation	Translate and generate videos in multiple languages automatically.

Use cases

The following table describes some of the ways that you can use Azure AI Speech.

Use case	Capability to use	Description
Audio content creation	Speech-to-text	You can use neural voices to make interactions with chatbots and voice assistants more natural and engaging, convert digital texts such as e-books into audiobooks and enhance in-car navigation systems.
Call center transcription	Speech-to-text	Transcribe calls in real-time or process a batch of calls, redact personally identifying information, and extract insights such as sentiment to help with your call center use case.
Captioning	Speech-to-text	Synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios.
Language learning	Speech-to-text	Provide pronunciation assessment feedback to language learners, support real-time transcription for remote learning conversations, and read aloud teaching materials with neural voices.
Voice assistants	Text-to-speech	Create natural, human like conversational interfaces for their applications and experiences. The voice assistant feature provides fast and reliable interaction between a device and an assistant implementation.

Immersive Reader

Immersive Reader, part of Azure AI services, is an inclusively designed tool that implements proven techniques to improve reading comprehension for new readers, language learners, and people with learning differences such as dyslexia. With the Immersive Reader client library, you can use the same technology used in Microsoft Word and Microsoft OneNote to provide a great experience to your workload's users.

Capabilities

The following is a list of capabilities your workload could use to help your users' reach their reading comprehension goals.

Isolate content to improve readability
Display pictures for common words and terms
Help understand parts of speech and grammar by highlighting verbs, nouns, pronouns, and more
Read content aloud, such as user selected text in your workload's UI
Translate content into many languages in real time, which helps to improve comprehension for readers learning a new language
Break words into syllables to improve readability or to sound out new words

Share via

Choose an Azure AI speech recognition and generation technology

Services

Azure AI Speech

Capabilities

Use cases

Immersive Reader

Capabilities

Next steps

Feedback

Additional resources

Share via

Choose an Azure AI speech recognition and generation technology

Services

Azure AI Speech

Capabilities

Use cases

Immersive Reader

Capabilities

Next steps

Related resources

Feedback

Additional resources