Why does Azure Speech-to-Text detect French accurately in a standalone Python script but perform poorly in a real-time video call integration?
I'm working on a real-time translation project using Azure Speech Services. When I run my translation code in a standalone Python script, it accurately recognizes and translates French and English speech. However, when the same Speech-to-Text functionality is integrated into a video call (using WebSocket connections), the recognition of French is significantly less accurate.
Here’s a summary of my setup:
- Python Script: I use Azure's cognitive services for real-time speech recognition, and the language detection works very well, especially for French.
- Video Call Integration: Using Azure Speech Services in a Node.js application, I use the same language configurations and WebSocket to capture and process audio from live video calls, but the French detection is consistently inaccurate.
I’ve ensured that the audio quality is similar in both cases and that the language configurations match. Unless there is an underlying issue somewhere else, the model recognizes english (not amazingly but the language synthesis is there), and does not process french well at all. I have also tried with italian and spanish and they are not great. Is there a language code issue since I am using the speech to text translate and text to speech? l