How can an application using Azure Communication Service using the Play APIs determine which words have been spoken by TTS

Sameer 0 Reputation points
2025-02-27T15:48:58.98+00:00

Hi,

We're using Azure Communication Service to receive calls from users. When a user calls in, we use the ACS Play API - https://learn.microsoft.com/en-ca/azure/communication-services/concepts/call-automation/play-action to for TTS.

We need to accurately track what words were spoken by the TTS to the caller in order to handle interruptions by the User.

The PlayStarted and PlayCompleted events dont provide sufficient granularity. We need to determine exactly which words were spoken when by the TTS. Is there an option to receive transcription data from the TTS the same as way as receiving transcription data for a human caller ?

Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
1,020 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Siva Nair 410 Reputation points Microsoft Vendor
    2025-02-27T16:51:16.1066667+00:00

    Hi Sameer,

    Welcome to Microsoft Q&A,

    You can track limit sounds by inserting pauses using SSML with the <break> mark for Azure Communication Services TTS to control the pacing. Place breaks after key phrases against which you will time events, such as by correlating them with PlayStarted and PlayCompleted events. For instance, putting <break time="500ms"/> after every sentence. Log the time for each event to roughly glean when each section is read After that, this will strike a balance between simplicity and accuracy without extensive setups being needed. It can also be plugged into your backend via the ACS Play API.

    Enable real-time transcription for the user's speech on the same call using Azure's Call Automation features. The transcription data is sent via the same WebSocket connection. Correlate the TTS events with the incoming transcription events to handle user interruptions accurately.

    for reference,

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-structure

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup

    If you have any further assistant, do let me know. 

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.