How to perform speech recognition to get speech input over a telephony phone call in Microsoft Azure?

Admin Saad 0 Reputation points
2024-11-23T22:50:33.08+00:00

Things We Have Done:

  1. Created an Azure Communication Service (ACS) instance and acquired an active phone number.
  2. Set up an event subscription to host the callback link required to interact with the purchased phone number.
  3. Deployed Azure Speech Service to generate an endpoint and API key for text-to-speech and speech-to-text functionalities.
  4. Developed Python and C# code to integrate these functionalities and connect/interact on the phone number.

Achievements with Python Code:

  • Successfully connected calls.
  • Enabled basic text-to-speech for trial questions.

Challenges with Python Code:

  • Debugging an issue where the converted text-to-speech audio files for questions are not playing on the call.

Help Needed for Python Code:

  • Understanding if the issue with playing media on the call is due to limitations in the Azure Communication Service SDK for Python and, if so, identifying possible workarounds.

Achievements with C# Code:

  • Successfully connected calls.
  • Enabled the bot to ask questions during the call (extracting questions/prompts from Excel & playing them in the call.

Challenges with C# Code:

  • While user responses are likely being monitored, we are unable to capture what the user is saying during the call (user speech input).
  • Despite providing InitialSilenceTimeout of 10 secs, once bot is done reading out the prompt if I do not say anything, the bot moves on to the next question in a matter of 1-2 seconds & does not reprompt the current question. Even if I try to say something within that 1-2 seconds, I do not believe the bot is getting my speech input & as it moves on to the next question regardless.

Help Needed for C# Code:

  • Validating if we are using the correct service (Azure Speech Service) for speech-to-text integration.
  • Guidance on how to capture real-time speech-to-text responses effectively during a call.

**
Additional Context:**

  • The Python approach is relatively new (2-day-old effort) as we pivoted after encountering roadblocks with the C# implementation, despite extensive debugging over 5-6 days.
  • This intended solution is related to telephony requiring speech input from a mobile device phone instead of relying on speech input from the microphone of a laptop/computer.
  • For testing, we are following the setup instructions mentioned in the provided GitHub reference link, including setting up Azure DevTunnel & running the app. Assuming these steps are followed & Azure services are configured properly, when calling the ACS phone #, the phone call is able to go through.
  • GitHub reference link for C#: https://github.com/Azure-Samples/communication-services-dotnet-quickstarts/tree/main/callautomation-openai-sample-csharp
  • Python Version: 3.12.6

C# approach code:

using Azure;

using Azure.AI.OpenAI;

using Azure.Communication;

using Azure.Communication.CallAutomation;

using Azure.Messaging;

using Azure.Messaging.EventGrid;

using Azure.Messaging.EventGrid.SystemEvents;

using Microsoft.AspNetCore.Mvc;

using System.ComponentModel.DataAnnotations;

using System.Text.RegularExpressions;

using Microsoft.CognitiveServices.Speech;

var builder = WebApplication.CreateBuilder(args);

int currentQuestionId = 1; // Start with the first question

var excelFilePath = builder.Configuration.GetValue

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,807 questions
Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
925 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.