Azure Bot that joins Microsoft Teams Call and transfers speech to text using Azure Speech Service

Question

I configured Azure Bot which is configured to join Microsoft Teams calls, and calling endpoint is provided. Now using .NET I have implementation for calling endpoint and also the speech service starts converting speech to text when the call is answered. But the issue is that this works right now with my Microphone, the audio is not going through team's call (no audio configuration for speech service). How I can achieve that the audio to go from teams to my server not from my microphone to the server.. I was going through the documentation, but it didn't help much..

Answer

Hi Gjorgji Mitrevski,

Welcome to Microsoft Q&A Forum! Thanks for your question.

It looks like your Azure Bot is successfully joining Microsoft Teams calls, but the audio isn't being routed correctly to Azure Speech Service for transcription. Currently, your setup is capturing audio from your microphone instead of the Teams call itself. To resolve this, you need to configure the bot to extract real-time media from Teams and process it through Azure Speech Service.

Steps to Achieve This

Step 1: Use Microsoft Graph API for Teams Call Integration

Ensure your bot is correctly configured to capture media from Teams calls.
Reference: Calls and Meetings Bots Overview

Example: Using Microsoft Graph API to Join a Call

var requestBody = new
{
    subject = "Bot Meeting",
    attendees = new[]
    {
        new { identity = new { user = new { id = "" } } }
    }
};

var response = await graphClient.Me.OnlineMeetings.Request()
                .AddAsync(requestBody);

Step 2: Enable Real-Time Media Streaming

Configure the bot to extract real-time media using the Teams Bot Framework.
Reference: Real-Time Media Concepts

Example: Handling Audio Streams in a Bot

public override async Task OnAudioMediaReceived(
    AudioMediaReceivedEventArgs args)
{
    byte[] audioBuffer = args.Buffer;
    await _speechService.SendAudioAsync(audioBuffer);
}

Step 3: Stream Audio to Azure Speech Service

Route the extracted media stream to Azure Speech Service for transcription.
Reference: Azure Speech-to-Text

Example: Sending Audio Stream to Speech-to-Text API

using var audioConfig = AudioConfig.FromWavFileInput("audio.wav");
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

recognizer.Recognizing += (s, e) =>
{
    Console.WriteLine($"Recognizing: {e.Result.Text}");
};

await recognizer.StartContinuousRecognitionAsync();

Step 4: Use Direct Line Speech for Enhanced Integration

If required, consider using Direct Line Speech to improve communication between the bot and Azure Speech Service.
Reference: Direct Line Speech Integration

Example: Enabling Direct Line Speech for a Bot

{
    "type": "directlinespeech",
    "serviceEndpoint": "https://directline.botframework.com/v3/directline"
}

As part of the next steps, to get your bot fully functional, deploy it using Azure Bot Service (Deploy a Teams Bot), ensure you grant the necessary Graph API permissions for calls and meetings (Graph API Permissions), and optimize audio processing for real-time transcription using Azure Speech Streaming API (Azure Speech Streaming API).

Please try out these steps and check if they provide a solution. Hope this answer helps! Please comment below if you need any assistance. Happy to help!

Regards,

Chakravarthi Rangarajan Bhargavi

- Please kindly accept the answer and vote 'Yes' if you find it helpful to support the community. Thanks a lot!

Share via

Azure Bot that joins Microsoft Teams Call and transfers speech to text using Azure Speech Service

2 answers

Your answer