Incorporating Speech to Text via a Skillset

grajee 371 Reputation points
2025-02-17T23:16:34.5566667+00:00

All,

I have a set of audio files containing speech that I would like to covert to text as it gets extracted via Azure AI Search.

The default Cognitive Skill works for OCR and Language Identifier. I don't know how to convert speech to text.

Thanks,

Gopi

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,199 questions
0 comments No comments
{count} votes

Accepted answer
  1. Chakaravarthi Rangarajan Bhargavi 1,030 Reputation points MVP
    2025-02-18T05:42:45.3233333+00:00

    Hi Gopi,

    Thanks for the question. As part of this question, To convert speech to text while extracting content via Azure AI Search, you'll need to integrate Azure Cognitive Services - Speech to Text with your skillset. Since Azure AI Search does not include a built-in Speech-to-Text skill, you'll need to use Azure Speech Services separately or create a Custom Skill in your Cognitive Search pipeline.

    As part of the solution, Decide whether to use a Custom Skill in Azure AI Search or process audio before indexing. Set up Azure Speech Services and test transcription. Integrate with Azure AI Search using Custom Skills, Logic Apps, or Azure Functions. Here are the steps below which can help you with

    1. Use Azure Cognitive Search Custom Skill with Speech-to-Text API

    Azure AI Search allows Custom Skills to process data before indexing. You can create an Azure Function that calls Azure Speech Services and returns transcribed text.

    Steps:

    1. Store your audio files in Azure Blob Storage.
    2. Configure an Azure AI Search Indexer to read these files.
    3. Create an Azure Function to call the Speech-to-Text API and return the transcribed text.
    4. Use the function as a Custom Skill in your Azure AI Search pipeline.
    5. Use Azure Speech Services Directly Before Indexing

    If your audio files are not indexed yet, you can:

    Use Azure Speech SDK or REST API to transcribe the audio files.

    Store the transcribed text in Azure Blob Storage or Cosmos DB.

    Use Azure AI Search to index the transcribed text.

    Sample Python Code Using Azure Speech SDK:

    import azure.cognitiveservices.speech as speechsdk
    
    speech_key = "Your_Speech_API_Key"
    service_region = "Your_Region"
    
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename="your_audio_file.wav")
    
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    result = speech_recognizer.recognize_once()
    
    print("Transcription:", result.text)
    
    1. Automate Speech-to-Text Conversion Using Logic Apps

    You can automate the process using Azure Logic Apps:

    Trigger: When a new audio file is uploaded to Azure Blob Storage.

    Action: Use Azure Speech Services to transcribe the text.

    Output: Store the transcribed text in Azure Storage/Table/CosmosDB for Azure AI Search to index.

    Please try out these steps and check if there any solution to it. Hope this answer helps you with solution! Please comment below if you need any assistance on the same. Happy to help!

    Regards,

    Chakravarthi Rangarajan Bhargavi

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.