Azure Real time speech to text not working through python but curl is working

Ulhas Hulyal, Nilesh 0

We have enabled Azure speech to text service with private end point, when we try to use below curl command it we able to get output

curl -i --location 'https://xxxxxxxxxxx?language=en-US' --header 'Accept: application/json' --header 'Ocp-Apim-Subscription-Key: xxxxxxxxx' --header 'Content-Type: audio/wav' --data-binary '@/app/temp/longer_audio.wav'

but when used attached script it is event is getting cancelled, can anyone please help here

kothapally Snigdha 1,100 Reputation points Microsoft Vendor

2025-01-27T11:40:54.9466667+00:00

Hi Ulhas Hulyal, Nilesh

Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.

Could you please share me the error trace. for more details.

Thanks.

Ulhas Hulyal, Nilesh 0

We are using below script and event is getting canceled, same is working for another speech service enabled with private end point.

import os
import time
import azure.cognitiveservices.speech as speechsdk
from datetime import datetime
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs, file_handle):
    file_handle.write('Canceled event\n')

def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs, file_handle):
    file_handle.write('SessionStopped event\n')

def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs, file_handle):
    file_handle.write('TRANSCRIBED:\n')
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        file_handle.write('\tText={}\n'.format(evt.result.text))
        file_handle.write('\tSpeaker ID={}\n'.format(evt.result.speaker_id))
        start_time_ticks = evt.result.offset
        duration_ticks = evt.result.duration
        stop_time_ticks = start_time_ticks + duration_ticks
        start_time_seconds = start_time_ticks / 10000000
        duration_seconds = duration_ticks / 10000000
        stop_time_seconds = stop_time_ticks / 10000000
        
        # Write the start and stop times
        file_handle.write('\tStart Time (s)={:.2f}\n'.format(start_time_seconds))
        file_handle.write('\tStop Time (s)={:.2f}\n'.format(stop_time_seconds))
        
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        file_handle.write('\tNOMATCH: Speech could not be TRANSCRIBED: {}\n'.format(evt.result.no_match_details))

def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs, file_handle):
    file_handle.write('SessionStarted event\n')

def recognize_from_file(path):
    try:
        # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        print("started********")
        speech_config = speechsdk.SpeechConfig(subscription="xxxxxx", endpoint="wss://xxxxxxxx.cognitiveservices.azure.com/stt/speech/recognition/conversation/cognitiveservices/v1")

        
        speech_config.speech_recognition_language="en-US"
        
        audio_config = speechsdk.audio.AudioConfig(filename=path)
        conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
    
        transcribing_stop = False
    
        def stop_cb(evt: speechsdk.SessionEventArgs):
            #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
            #file_handle.write('CLOSING on {}\n'.format(evt))
            nonlocal transcribing_stop
            transcribing_stop = True
    
        # Open the file for writing
        videoname = os.path.basename(path)
        file = videoname[:-4]
        with open(file+"_transcription_output.txt", "w") as file_handle:
            starttime=datetime.now()
            file_handle.write('VideoName={}\n'.format(videoname))
            file_handle.write('StartTime={}\n'.format(starttime))
            # Connect callbacks to the events fired by the conversation transcriber
            conversation_transcriber.transcribed.connect(lambda evt: conversation_transcriber_transcribed_cb(evt, file_handle))
            conversation_transcriber.session_started.connect(lambda evt: conversation_transcriber_session_started_cb(evt, file_handle))
            conversation_transcriber.session_stopped.connect(lambda evt: conversation_transcriber_session_stopped_cb(evt, file_handle))
            conversation_transcriber.canceled.connect(lambda evt: conversation_transcriber_recognition_canceled_cb(evt, file_handle))
            # stop transcribing on either session stopped or canceled events
            conversation_transcriber.session_stopped.connect(stop_cb)
            conversation_transcriber.canceled.connect(stop_cb)
    
            conversation_transcriber.start_transcribing_async()
    
            # Waits for completion.
            while not transcribing_stop:
                time.sleep(.5)
    
            conversation_transcriber.stop_transcribing_async()
            print("completed********")
            endtime=datetime.now()-starttime
            file_handle.write('EndTime={}\n'.format(endtime))
    except Exception as e:
        print("Exception occurred",e)
# Main

try:
    path=r"/app/temp/longer_audio.wav"
    recognize_from_file(path)
   
except Exception as err:
    print("Encountered exception. {}".format(err))

kothapally Snigdha 1,100 Reputation points Microsoft Vendor

2025-01-27T12:47:53.56+00:00

Hi Ulhas Hulyal, Nilesh

Thank you for providing the complete code details. It seems the issue may not be with the script. Please check for any corruption in the private endpoints and verify DNS resolution by running 'nslookup <fqdnofspeech>', which should resolve to the private IP as specified in the DNS record. If needed, you may want to recreate the endpoints.

Thanks.
Ulhas Hulyal, Nilesh 0 Reputation points

2025-01-27T12:56:19.2033333+00:00

okay we will check this. but, wanted to understand why curl is working with same private endpoint?
Ulhas Hulyal, Nilesh 0 Reputation points

2025-01-27T12:58:59.4733333+00:00

@kothapally Snigdha Thank you for your reply, we will check points which you mentioned. Just one query if any corruption is the private endpoints then curl should also fail correct , but curl is working with same private endpoint
kothapally Snigdha 1,100 Reputation points Microsoft Vendor

2025-01-28T09:15:20.61+00:00

Hi Ulhas Hulyal, Nilesh

Following up to see if the above response was helpful.
Ulhas Hulyal, Nilesh 0 Reputation points

2025-01-28T10:33:17.0033333+00:00

Hi @kothapally Snigdha Script is working fine for us in another environment , but it seems in one environment where it is failing proxy is blocking wss private endpoint.

We tried to use speech_config.set_proxy( parameter to resolve this issue but it didn't work.

Accepted answer

kothapally Snigdha 1,100 Microsoft Vendor

Hi Ulhas Hulyal, Nilesh

we are able reproduce the issue and able to solve in our environment can you please check once with

import time
import azure.cognitiveservices.speech as speechsdk
 
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
    print('Canceled event')
 
def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStopped event')
 
def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    print('\nTRANSCRIBED:')
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('\tText={}'.format(evt.result.text))
        print('\tSpeaker ID={}\n'.format(evt.result.speaker_id))
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
 
def conversation_transcriber_transcribing_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    print('TRANSCRIBING:')
    print('\tText={}'.format(evt.result.text))
    print('\tSpeaker ID={}'.format(evt.result.speaker_id))
 
def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStarted event')
 
def recognize_from_file():
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_config = speechsdk.SpeechConfig(subscription="<KEY>", region="<REGION>")
    speech_config.speech_recognition_language="en-US"
    speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceResponse_DiarizeIntermediateResults, value='true')
 
    audio_config = speechsdk.audio.AudioConfig(filename=r"FILE_PATH")
    conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
 
    transcribing_stop = False
 
    def stop_cb(evt: speechsdk.SessionEventArgs):
        #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal transcribing_stop
        transcribing_stop = True
 
    # Connect callbacks to the events fired by the conversation transcriber
    conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
    conversation_transcriber.transcribing.connect(conversation_transcriber_transcribing_cb)
    conversation_transcriber.session_started.connect(conversation_transcriber_session_started_cb)
    conversation_transcriber.session_stopped.connect(conversation_transcriber_session_stopped_cb)
    conversation_transcriber.canceled.connect(conversation_transcriber_recognition_canceled_cb)
    # stop transcribing on either session stopped or canceled events
    conversation_transcriber.session_stopped.connect(stop_cb)
    conversation_transcriber.canceled.connect(stop_cb)
 
    conversation_transcriber.start_transcribing_async()
 
    # Waits for completion.
    while not transcribing_stop:
        time.sleep(.5)
 
    conversation_transcriber.stop_transcribing_async()
 
# Main
 
try:
    recognize_from_file()
except Exception as err:
    print("Encountered exception. {}".format(err))

output.

User's image

I hope this helps you. Thank you.

Ulhas Hulyal, Nilesh 0 Reputation points

2025-01-28T10:54:28.8966667+00:00

Hi @kothapally Snigdha Script is working fine for us in another environment , but it seems in one environment where it is failing proxy is blocking wss private endpoint.

We tried to use speech_config.set_proxy( parameter to resolve this issue but it didn't work.
Pavankumar Purilla 2,930 Reputation points Microsoft Vendor

2025-01-29T02:47:46.4866667+00:00

Hi Ulhas Hulyal, Nilesh,
It seems that the issue you are facing might be due to a proxy blocking the wss private endpoint. You mentioned that you tried using speech_config.set_proxy() to resolve the issue, but it didn't work. In that case, you might want to check if the proxy settings are correct and if the proxy is allowing connections to the private endpoint. You can also try disabling the proxy temporarily to see if that resolves the issue.
Ulhas Hulyal, Nilesh 0 Reputation points

2025-01-29T08:10:45.47+00:00

@Pavankumar Purilla

Thank you for your reply and inputs
speech_config.set_proxy( ) needed parameter without http:// for proxy url. post that it started working, followed beow document for the same

https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechconfig?view=azure-python#azure-cognitiveservices-speech-speechconfig-set-proxy
kothapally Snigdha 1,100 Reputation points Microsoft Vendor

2025-01-29T09:54:33.21+00:00

Hi Ulhas Hulyal, Nilesh,

Thank you for your comment. Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Azure Real time speech to text not working through python but curl is working

0 additional answers

Your answer