Azure Bot that joins Microsoft Teams Call and transfers speech to text using Azure Speech Service

Gjorgji Mitrevski 0 Reputation points
2025-02-11T15:01:00.4866667+00:00

I configured Azure Bot which is configured to join Microsoft Teams calls, and calling endpoint is provided. Now using .NET I have implementation for calling endpoint and also the speech service starts converting speech to text when the call is answered. But the issue is that this works right now with my Microphone, the audio is not going through team's call (no audio configuration for speech service). How I can achieve that the audio to go from teams to my server not from my microphone to the server.. I was going through the documentation, but it didn't help much..

Azure AI Bot Service
Azure AI Bot Service
An Azure service that provides an integrated environment for bot development.
899 questions
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,924 questions
.NET
.NET
Microsoft Technologies based on the .NET software framework.
4,105 questions
Microsoft Teams
Microsoft Teams
A Microsoft customizable chat-based workspace.
10,898 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

  2. Chakaravarthi Rangarajan Bhargavi 1,030 Reputation points MVP
    2025-02-20T02:44:26.8033333+00:00

    Hi Gjorgji Mitrevski,

    Welcome to Microsoft Q&A Forum! Thanks for your question.

    It looks like your Azure Bot is successfully joining Microsoft Teams calls, but the audio isn't being routed correctly to Azure Speech Service for transcription. Currently, your setup is capturing audio from your microphone instead of the Teams call itself. To resolve this, you need to configure the bot to extract real-time media from Teams and process it through Azure Speech Service.

    Steps to Achieve This

    Step 1: Use Microsoft Graph API for Teams Call Integration

    Example: Using Microsoft Graph API to Join a Call

    var requestBody = new
    {
        subject = "Bot Meeting",
        attendees = new[]
        {
            new { identity = new { user = new { id = "<USER_ID>" } } }
        }
    };
    
    var response = await graphClient.Me.OnlineMeetings.Request()
                    .AddAsync(requestBody);
    

    Step 2: Enable Real-Time Media Streaming

    Example: Handling Audio Streams in a Bot

    public override async Task OnAudioMediaReceived(
        AudioMediaReceivedEventArgs args)
    {
        byte[] audioBuffer = args.Buffer;
        await _speechService.SendAudioAsync(audioBuffer);
    }
    

    Step 3: Stream Audio to Azure Speech Service

    • Route the extracted media stream to Azure Speech Service for transcription.
    • Reference: Azure Speech-to-Text

    Example: Sending Audio Stream to Speech-to-Text API

    using var audioConfig = AudioConfig.FromWavFileInput("audio.wav");
    using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
    recognizer.Recognizing += (s, e) =>
    {
        Console.WriteLine($"Recognizing: {e.Result.Text}");
    };
    
    await recognizer.StartContinuousRecognitionAsync();
    

    Step 4: Use Direct Line Speech for Enhanced Integration

    • If required, consider using Direct Line Speech to improve communication between the bot and Azure Speech Service.
    • Reference: Direct Line Speech Integration

    Example: Enabling Direct Line Speech for a Bot

    {
        "type": "directlinespeech",
        "serviceEndpoint": "https://directline.botframework.com/v3/directline"
    }
    

    As part of the next steps, to get your bot fully functional, deploy it using Azure Bot Service (Deploy a Teams Bot), ensure you grant the necessary Graph API permissions for calls and meetings (Graph API Permissions), and optimize audio processing for real-time transcription using Azure Speech Streaming API (Azure Speech Streaming API).

    Please try out these steps and check if they provide a solution. Hope this answer helps! Please comment below if you need any assistance. Happy to help!

    Regards,

    Chakravarthi Rangarajan Bhargavi

    - Please kindly accept the answer and vote 'Yes' if you find it helpful to support the community. Thanks a lot!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.