Stream Audio Issue with Speech
We’re using a Python FastAPI server to stream audio from the browser via WebSocket and pass it to Azure Speech. Our goal is to automatically recognize the input language, translate it to English, and stream both the translated text and audio back to the browser. The challenge seems to be with sending the audio stream to Azure Speech using AudioStream. When using use_default_microphone=True, everything works perfectly. However, streaming the audio input instead of using the default microphone appears to be the issue. Thanks here's the code, @router.websocket("/translate/speech")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
await websocket.send_text("Connected to the translator service")
# Create a PushAudioInputStream to act as a bucket for incoming audio data.
audio_format = speechsdk.audio.AudioStreamFormat(samples_per_second=16000, bits_per_sample=16, channels=1)
audio_stream = speechsdk.audio.PushAudioInputStream(stream_format=audio_format)
audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)
# Create a speech translation config with specified subscription key and service region.
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=AZURE_SPEECH_SUBS_KEY, region=AZURE_SPEECH_REGION)
# Replace with the languages of your choice, from list found here: https://aka.ms/speech/sttt-languages
from_language = "en-US"
to_language = "es"
translation_config.speech_recognition_language = from_language
translation_config.add_target_language(to_language)
translation_config.voice_name = "en-US-JennyNeural" # Optional: Set the voice name of the output translation.
# Create the TranslationRecognizer with the audio configuration.
recognizer = speechsdk.translation.TranslationRecognizer(translation_config=translation_config, audio_config=audio_config)
# Configure speech synthesis (for translated speech output)
speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_SUBS_KEY, region=AZURE_SPEECH_REGION)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)