What is Speech Studio?

Article
09/24/2024

Speech Studio is a set of UI-based tools for building and integrating features from Azure AI Speech service in your applications. You create projects in Speech Studio by using a no-code approach, and then reference those assets in your applications by using the Speech SDK, the Speech CLI, or the REST APIs.

Tip

You can also try speech to text and text to speech in the Azure AI Foundry portal without signing up or writing any code.

Speech Studio scenarios

Explore, try out, and view sample code for some of common use cases.

Captioning: Choose a sample video clip to see real-time or offline processed captioning results. Learn how to synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios. For more information, see the captioning quickstart.
Call Center: View a demonstration on how to use the Language and Speech services to analyze call center conversations. Transcribe calls in real-time or process a batch of calls, redact personally identifying information, and extract insights such as sentiment to help with your call center use case. For more information, see the call center quickstart.

For a demonstration of these scenarios in Speech Studio, view this introductory video.

Speech Studio features

In Speech Studio, the following Speech service features are available as project types:

Real-time speech to text: Quickly test speech to text by dragging audio files here without having to use any code. Speech Studio has a demo tool for seeing how speech to text works on your audio samples. To explore the full functionality, see What is speech to text.
Batch speech to text: Quickly test batch transcription capabilities to transcribe a large amount of audio in storage and receive results asynchronously, To learn more about Batch Speech-to-text, see Batch speech to text overview.
Custom speech: Create speech recognition models that are tailored to specific vocabulary sets and styles of speaking. In contrast to the base speech recognition model, Custom speech models become part of your unique competitive advantage because they're not publicly accessible. To get started with uploading sample audio to create a custom speech model, see Upload training and testing datasets.
Pronunciation assessment: Evaluate speech pronunciation and give speakers feedback on the accuracy and fluency of spoken audio. Speech Studio provides a sandbox for testing this feature quickly, without code. To use the feature with the Speech SDK in your applications, see the Pronunciation assessment article.
Speech Translation: Quickly test and translate speech into other languages of your choice with low latency. To explore the full functionality, see What is speech translation.
Voice Gallery: Build apps and services that speak naturally. Choose from a broad portfolio of languages, voices, and variants. Bring your scenarios to life with highly expressive and human-like neural voices.
Custom voice: Create custom, one-of-a-kind voices for text to speech. You supply audio files and create matching transcriptions in Speech Studio, and then use the custom voices in your applications. To create and use custom voices via endpoints, see Create and use your voice model.
Audio Content Creation: A no-code approach for text to speech synthesis. You can use the output audio as-is, or as a starting point for further customization. You can build highly natural audio content for various scenarios, such as audiobooks, news broadcasts, video narrations, and chat bots. For more information, see the Audio Content Creation documentation.
Custom Keyword: A custom keyword is a word or short phrase that you can use to voice-activate a product. You create a custom keyword in Speech Studio, and then generate a binary file to use with the Speech SDK in your applications.

Next steps

Explore Speech Studio

Share via

What is Speech Studio?

Speech Studio scenarios

Speech Studio features

Next steps

Feedback

Additional resources