Does Real Time Azure Speech To Text support providing display form word level timestamps

Szulakiewicz, Michal 0 Reputation points
2025-01-08T13:24:08.9533333+00:00

I have observed that the Batch STT mechanism in Azure Speech Studio allows me to retrieve either / or "Display form word level timestamps" and "Lexical form word level timestamps". This is a great choice, depending on my use case.

I am also using the Real Time STT to perform audio transcriptions and I am able to request word level timestamps when retrieving the response as JSON. However the "words" array in that JSON seems to only provide the lexical form word level timestamps and I cannot find a way to make it contain the display form word level timestamps.

Is there any property that can be set to achieve my desired behavior?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,857 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Pavankumar Purilla 2,290 Reputation points Microsoft Vendor
    2025-01-09T01:03:16.0766667+00:00

    Hi Szulakiewicz, Michal,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    Real-Time Azure Speech-to-Text currently does not provide display form word-level timestamps directly in the JSON response. While the Batch Speech-to-Text API allows you to choose between "Display form" and "Lexical form" word-level timestamps, the Real-Time API only provides timestamps for lexical form words in the Words array.

    The suggested workaround is to manually align the lexical word timestamps with the normalized text (DisplayText) by applying inverse text normalization (ITN), capitalization, and punctuation detection to the lexical words. This process can be error-prone and time-consuming.

    Since this functionality is not directly supported, it is recommended to submit a feature request to Microsoft Azure to add support for display form word-level timestamps in the Real-Time Speech-to-Text API, similar to the Batch API. Here's the link to the Azure Feedback Forum: Post idea · Community (azure.com). This feature would eliminate the need for manual alignment and improve the usability of the API for scenarios like yours.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

  2. Szulakiewicz, Michal 0 Reputation points
    2025-01-09T12:03:11.7433333+00:00

    Hi @Pavankumar Purilla !

    Thanks for the reply. I have submitted a request in the Azure Feedback Forum. Let's see if this is resolved in the future.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.