Continuous speech recognition cutting off after 30 seconds

JohnSmith123 0 Reputation points
2025-01-15T14:30:53.85+00:00

Hello,

We're encountering an inconsistent issue with the Azure Speech SDK where continuous recognition sessions are prematurely terminating around the 30-second mark. This issue is significantly impacting our application's ability to transcribe long-form audio in real-time.

Problem:

When using the Azure Speech SDK to perform continuous speech recognition, we're observing that the recognition session often cuts off around 30 seconds, even though the audio input continues. This behavior is inconsistent, with the cut-off time sometimes varying slightly above or below 30 seconds.

Technical Details:

  • SDK: Azure Speech SDK
  • API: SpeechRecognizer with continuous recognition (startContinuousRecognitionAsync)
  • Audio Input:
    • We're primarily using .fromDefaultMicrophoneInput() for real-time microphone input.
    • To isolate the issue, we've also tested with audio files longer than 30 seconds using.fromWavFileInput(). The issue persists even with pre-recorded audio files.
  • Expected Behavior: We expect startContinuousRecognitionAsync() to continue transcribing until explicitly stopped using stopContinuousRecognitionAsync() or until the end of the audio stream is reached.
  • Observed Behavior: The recognition session terminates prematurely, often around 30 seconds, without any errors or exceptions being raised. We observe that a canceled event is triggered ?
  • Contrast with Fast Transcription API: When using this endpoint, we successfully receive complete transcriptions of the same audio files. This suggests the problem lies within the SDK's handling of continuous recognition.

Steps Taken:

  • Microphone Input vs. File Input: As mentioned, we tested both fromDefaultMicrophoneInput() and fromWavFileInput() to rule out microphone-specific issues. The problem persists in both scenarios.
  • Varying Audio Length: We experimented with different audio file durations, all longer than 30 seconds, to ensure the input stream wasn't ending prematurely.
  • Checked for Errors/Exceptions: The SDK does not throw any explicit errors or exceptions when the cut-off occurs.
  • Compared with Fast transcription: The success of the fast Transcription API confirms that our audio format and data are valid.
  • Checked the Github Repository for issues/threads with similar problems: We were able to see this thread on the github repo where a user had a similar issue. Unfortunately, the issue was resolved privately on a call.
  • Tried an azure sample application using our API key: we used this continuous integration example to test the expected behavior, but the same thing happened and we were getting a new transcription after the ~30s mark (also inconsistent)

Questions:

  1. Are there any known limitations or configurations within the Azure Speech SDK that might cause this inconsistent cut-off behavior with startContinuousRecognitionAsync()?
  2. Are there any diagnostic logs or debugging techniques we can employ within the SDK to gain deeper insights into the cause of the premature termination?
  3. Are there any workarounds or alternative approaches within the SDK that we could use to achieve reliable, long-form continuous recognition?
  4. Could this be as a result of our API credentials not having required privileges to transcribe more than 30 seconds of audio?
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,881 questions
{count} votes

2 answers

Sort by: Most helpful
  1. JohnSmith123 0 Reputation points
    2025-01-20T09:23:33.5466667+00:00

    Hello santoshkc, thank you for your message!

    1. We're not seeing any errors on our end, which makes it difficult to determine when a restart is needed.
    2. We explored batch transcription, but it didn't meet our speed requirements unfortunately.
    3. We're using standard synthetic TTS audio as input to the transcription API. There's no custom speech or unusual audio format involved.
    4. Regarding the subscription, our Azure portal shows "Azure Plan" as the plan type and "Standard" as the Speech Services pricing tier. Since the "fast transcription" endpoint is working correctly, we're doubtful that the subscription is the root cause. Unfortunately, the fast transcription supports few languages and does not help our use case.

    Thank you in advance!

    0 comments No comments

  2. JohnSmith123 0 Reputation points
    2025-01-20T09:29:58.7033333+00:00

    Thank you for your message!

    1. We're not seeing any errors on our end, which makes it difficult to determine when a restart is needed.
    2. We explored batch transcription, but it didn't meet our speed requirements.
    3. We're using standard synthetic TTS audio as input to the transcription API. There's no custom speech or unusual audio format involved.
    4. Regarding the subscription, our Azure portal shows "Azure Plan" as the plan type and "Standard" as the Speech Services pricing tier. Since the "fast transcription" endpoint is working correctly, we're doubtful that the subscription is the root cause.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.