Setting Up 1:1 Call with Real-Time Transcription Using Call Automation SDK

Idris LOKHANDWALA 1 Reputation point Microsoft Employee
2025-01-27T12:32:33.0466667+00:00

Hi,

I am currently working on a project where I need to set up a 1:1 call between two individuals using the Azure Communication Services Call Automation SDK. My goal is to enable real-time transcription during the call. I have followed the basic setup instructions, but I am encountering some issues and would appreciate some guidance.

Here are the steps I have taken so far:

  1. Created an Azure Communication Services resource.
  2. Set up a WebSocket server to stream the transcription in real-time.
  3. Established the call using the Call Automation SDK.
  4. Configured the transcription options, including the language and WebSocket connection.

However, I am facing challenges with:

  • Handling both IncomingCall and OutgoingCall events to capture details of both participants.
  • Instead of the instance of an AI agent such as gpt-4o-realtime or whisper, the call receiver would be a human agent.
  • I'm not able to obtain the individual participant Ids of both callers in the TranscriptionData object.

Could you please provide a detailed example or point me to a quickstart guide that covers these aspects? Any code snippets or additional documentation would be greatly appreciated.

Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
983 questions
{count} votes

Accepted answer
  1. Shree Hima Bindu Maganti 2,420 Reputation points Microsoft Vendor
    2025-01-27T15:22:49.1533333+00:00

    Hi @Idris LOKHANDWALA
    Thanks for the question and using MS Q&A platform.
    To set up a 1:1 call with real-time transcription using the Azure Communication Services Call Automation SDK, you need to handle both incoming and outgoing call events and configure your transcription options correctly.

    • Subscribe to IncomingCall and OutgoingCall events to get details about both participants. Implement event handlers to extract participant info.
    • Set the locale and WebSocket connection when configuring transcription options. Turn on the startTranscription option if you want transcription to start right when the call is answered.
    • The TranscriptionData object might not directly provide participant IDs. Keep a mapping of participant IDs when the call starts and link them with the transcription data you get.
    • Here’s a basic example of setting up the call and starting transcription.
        // Create call options
        var createCallOptions = new CreateCallOptions(callInvite, callbackUri)
        {
            CallIntelligenceOptions = new CallIntelligenceOptions() 
            { 
                CognitiveServicesEndpoint = new Uri(cognitiveServiceEndpoint) 
            },
            TranscriptionOptions = new TranscriptionOptions(new Uri(webSocketUri), "en-US", true, TranscriptionTransport.Websocket)
        };
        // Create the call
        CreateCallResult createCallResult = await callAutomationClient.CreateCallAsync(createCallOptions);
        // Start transcription
        StartTranscriptionOptions options = new StartTranscriptionOptions()
        {
            OperationContext = "startMediaStreamingContext",
        };
        await callMedia.StartTranscriptionAsync(options);
      
    • Make sure your WebSocket server is set up to handle incoming transcription data. You’ll get metadata and transcription data packets to process.
      For more detailed guidance, check out the Azure documentation on real-time transcription setup.

    Add real-time transcription into your application (programming-language-csharp)

    Add real-time transcription into your application (programming-language-javascript)

    Add real-time transcription into your application (programming-language-java)

    Add real-time transcription into your application (programming-language-python)
    If the answer is helpful, please click Accept Answer and kindly upvote it so that other people who faces similar issue may get benefitted from it.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.