Azure speech to text appears very slow

Sai Vishnu Soudri 60 Reputation points
2025-01-30T06:13:09.4933333+00:00

Hi team,

We have observed that the Azure speech-to-text is very slow. I am using continuousRecognitionAsync and I observe that Azure takes a total of close to 6s for just 3s audio.

The parameters that I've set are:

EndSilenceTimeoutMs = 750

InitialSilenceTimeoutMs = 180000

The language set is en-US

I am using the standard tier and this latency feels extremely huge. Is this expected or can we make improvements to the parameters/model to get acceptable latency?

A few observations that appear as concerns:

  1. Speech start detected takes close to 1.2s
  2. From speech start detected, the model takes 3 more seconds to give the last recognizing response
  3. From the last recognising response, it takes almost 2 seconds to give the recognized response.

Is continuousRecognitionAsync expected to take this much time? Please suggest some best practices for getting faster responses from Azure in real-time streaming.

Regards,

Sai Vishnu Soudri

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,891 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.