Incorporating Speech to Text via a Skillset

Question

All,

I have a set of audio files containing speech that I would like to covert to text as it gets extracted via Azure AI Search.

The default Cognitive Skill works for OCR and Language Identifier. I don't know how to convert speech to text.

Thanks,

Gopi

Accepted Answer

Hi Gopi,

Thanks for the question. As part of this question, To convert speech to text while extracting content via Azure AI Search, you'll need to integrate Azure Cognitive Services - Speech to Text with your skillset. Since Azure AI Search does not include a built-in Speech-to-Text skill, you'll need to use Azure Speech Services separately or create a Custom Skill in your Cognitive Search pipeline.

As part of the solution, Decide whether to use a Custom Skill in Azure AI Search or process audio before indexing. Set up Azure Speech Services and test transcription. Integrate with Azure AI Search using Custom Skills, Logic Apps, or Azure Functions. Here are the steps below which can help you with

Use Azure Cognitive Search Custom Skill with Speech-to-Text API

Azure AI Search allows Custom Skills to process data before indexing. You can create an Azure Function that calls Azure Speech Services and returns transcribed text.

Learn how to create a custom skill in Azure AI Search → Custom Skills in Azure AI Search
Learn how to integrate Azure AI Search with Blob Storage → Azure AI Search Indexing

Steps:

Store your audio files in Azure Blob Storage.
Configure an Azure AI Search Indexer to read these files.
Create an Azure Function to call the Speech-to-Text API and return the transcribed text.
Use the function as a Custom Skill in your Azure AI Search pipeline.
Use Azure Speech Services Directly Before Indexing

If your audio files are not indexed yet, you can:

Use Azure Speech SDK or REST API to transcribe the audio files.

Store the transcribed text in Azure Blob Storage or Cosmos DB.

Use Azure AI Search to index the transcribed text.

Learn how to use Azure Speech-to-Text API → Azure Speech Service
How to transcribe speech-to-text using Azure SDK → Speech-to-Text with Python

Sample Python Code Using Azure Speech SDK:

import azure.cognitiveservices.speech as speechsdk

speech_key = "Your_Speech_API_Key"
service_region = "Your_Region"

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename="your_audio_file.wav")

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
result = speech_recognizer.recognize_once()

print("Transcription:", result.text)

Automate Speech-to-Text Conversion Using Logic Apps

You can automate the process using Azure Logic Apps:

Trigger: When a new audio file is uploaded to Azure Blob Storage.

Action: Use Azure Speech Services to transcribe the text.

Output: Store the transcribed text in Azure Storage/Table/CosmosDB for Azure AI Search to index.

Learn how to automate workflows with Logic Apps → Azure Logic Apps Overview

Please try out these steps and check if there any solution to it. Hope this answer helps you with solution! Please comment below if you need any assistance on the same. Happy to help!

Regards,

Chakravarthi Rangarajan Bhargavi

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

Share via

Incorporating Speech to Text via a Skillset

0 additional answers

Your answer