How to use whisper model to transcribe audio in real time using Speech SDK?

Nas 0 Reputation points
2024-03-30T07:34:45.81+00:00

How do I use Whisper model to transcribe microphone input in real-time using Microsoft-cognitiveservices-speech-sdk npm package? I currently have this working and my region is set to northcentralus I want to know how to use Whisper to transcribe in real-time instead of using the default cognitive speech-to-text model, I wasn't able to find documentation for this.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,834 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Amira Bedhiafi 27,441 Reputation points
    2024-03-30T14:22:52.9266667+00:00

    According to the documentation :

    You use the Azure OpenAI Whisper model for speech to text.

    The file size limit for the Azure OpenAI Whisper model is 25 MB. If you need to transcribe a file larger than 25 MB, you can use the Azure AI Speech batch transcription API.

    For the real-time option, Whisper does not natively support streaming audio input for real-time transcription, so you'll need to manage this by breaking the audio into chunks and processing them sequentially. This approach introduces a slight delay but can approximate real-time processing.

    • Chunking Audio: Divide the continuous audio stream into manageable chunks. The size of these chunks can affect the latency and accuracy of the transcription, so you might need to experiment to find the best balance.
    • Transcribing with Whisper: For each audio chunk, use Whisper to transcribe the audio to text. This involves loading the Whisper model and passing the audio data to it for transcription.

    You can use Python for this task, leveraging libraries such as pyaudio for audio capture and the transformers library from Hugging Face for running Whisper.

    1 person found this answer helpful.

  2. navba-MSFT 26,805 Reputation points Microsoft Employee
    2024-04-01T02:52:35.65+00:00

    @Nas Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    The real-time speech to text feature is not available within the Azure AI Speech whisper model. See here.

    User's image

    So if you wish to create and run an application to recognize and transcribe speech to text in real-time, (without the Whisper model) using the ReactJS, please follow this article.

    For more information, see the React sample and the implementation of speech to text from a microphone on GitHub. This sample shows how to integrate the Azure Speech service into a sample React application. This sample shows design pattern examples for authentication token exchange and management, as well as capturing audio from a microphone or file for speech-to-text conversions.

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

  3. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.