Stream Audio Issue with Speech

Diomedes Kastanis 0 Reputation points Microsoft Employee
2025-02-05T15:39:56.7166667+00:00

We’re using a Python FastAPI server to stream audio from the browser via WebSocket and pass it to Azure Speech. Our goal is to automatically recognize the input language, translate it to English, and stream both the translated text and audio back to the browser. The challenge seems to be with sending the audio stream to Azure Speech using AudioStream. When using use_default_microphone=True, everything works perfectly. However, streaming the audio input instead of using the default microphone appears to be the issue. Thanks here's the code, @router.websocket("/translate/speech")

async def websocket_endpoint(websocket: WebSocket):

    await websocket.accept()

    await websocket.send_text("Connected to the translator service")

 

    # Create a PushAudioInputStream to act as a bucket for incoming audio data.

    audio_format = speechsdk.audio.AudioStreamFormat(samples_per_second=16000, bits_per_sample=16, channels=1)

    audio_stream = speechsdk.audio.PushAudioInputStream(stream_format=audio_format)

    audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)

 

    # Create a speech translation config with specified subscription key and service region.

    translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=AZURE_SPEECH_SUBS_KEY, region=AZURE_SPEECH_REGION)

 

    # Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages

    from_language = "en-US"

    to_language = "es"

    translation_config.speech_recognition_language = from_language

    translation_config.add_target_language(to_language)

    translation_config.voice_name = "en-US-JennyNeural"  # Optional: Set the voice name of the output translation.

 

    # Create the TranslationRecognizer with the audio configuration.

    recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config, audio_config=audio_config)

 

    # Configure speech synthesis (for translated speech output)

    speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_SUBS_KEY, region=AZURE_SPEECH_REGION)

    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,903 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.