Azure speech to text appears very slow

Sai Vishnu Soudri 60

Hi team,

We have observed that the Azure speech-to-text is very slow. I am using continuousRecognitionAsync and I observe that Azure takes a total of close to 6s for just 3s audio.

The parameters that I've set are:

EndSilenceTimeoutMs = 750

InitialSilenceTimeoutMs = 180000

The language set is en-US

I am using the standard tier and this latency feels extremely huge. Is this expected or can we make improvements to the parameters/model to get acceptable latency?

A few observations that appear as concerns:

Speech start detected takes close to 1.2s
From speech start detected, the model takes 3 more seconds to give the last recognizing response
From the last recognising response, it takes almost 2 seconds to give the recognized response.

Is continuousRecognitionAsync expected to take this much time? Please suggest some best practices for getting faster responses from Azure in real-time streaming.

Regards,

Sai Vishnu Soudri

santoshkc 12,035 Reputation points Microsoft Vendor

2025-01-30T14:56:50.2766667+00:00

Hi @Sai Vishnu Soudri,

Thank you for reaching out to Microsoft Q&A forum!

Your observed latency is expected due to processing, modelling, and network transmission. To optimize, deploy the speech resource closer to users or use Speech Containers for on-premise processing. Embedded Speech can also reduce cloud dependency.

For real-time transcription, use streaming input (Push/PullAudioInputStream) and set the default language to avoid extra processing. Asynchronous methods (start_continuous_recognition_async()) prevent blocking. Azure’s Fast Transcription (in preview) offers quicker results for large files.

For file-based transcription, split long audio into smaller chunks and process in parallel. Compressing audio before sending also reduces network delays. In speech synthesis, async methods and text streaming improve response times.

Optimizing these factors—network, streaming, async processing—can significantly reduce latency while maintaining accuracy. For more info: reduce latency.

I hope you understand. Thank you.
kothapally Snigdha 1,185 Reputation points Microsoft Vendor

2025-01-31T14:51:01.34+00:00

Hi @Sai Vishnu Soudri,

Following up to see if the above response was helpful.
santoshkc 12,035 Reputation points Microsoft Vendor

2025-02-03T08:25:47.9533333+00:00

Hi @Sai Vishnu Soudri,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others.

Share via

Azure speech to text appears very slow

Your answer