Does Real Time Azure Speech To Text support providing display form word level timestamps

Question

I have observed that the Batch STT mechanism in Azure Speech Studio allows me to retrieve either / or "Display form word level timestamps" and "Lexical form word level timestamps". This is a great choice, depending on my use case.

I am also using the Real Time STT to perform audio transcriptions and I am able to request word level timestamps when retrieving the response as JSON. However the "words" array in that JSON seems to only provide the lexical form word level timestamps and I cannot find a way to make it contain the display form word level timestamps.

Is there any property that can be set to achieve my desired behavior?

Answer

Hi Szulakiewicz, Michal,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

Real-Time Azure Speech-to-Text currently does not provide display form word-level timestamps directly in the JSON response. While the Batch Speech-to-Text API allows you to choose between "Display form" and "Lexical form" word-level timestamps, the Real-Time API only provides timestamps for lexical form words in the Words array.

The suggested workaround is to manually align the lexical word timestamps with the normalized text (DisplayText) by applying inverse text normalization (ITN), capitalization, and punctuation detection to the lexical words. This process can be error-prone and time-consuming.

Since this functionality is not directly supported, it is recommended to submit a feature request to Microsoft Azure to add support for display form word-level timestamps in the Real-Time Speech-to-Text API, similar to the Batch API. Here's the link to the Azure Feedback Forum: Post idea · Community (azure.com). This feature would eliminate the need for manual alignment and improve the usability of the API for scenarios like yours.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer

Hi @Pavankumar Purilla !

Thanks for the reply. I have submitted a request in the Azure Feedback Forum. Let's see if this is resolved in the future.

Share via

Does Real Time Azure Speech To Text support providing display form word level timestamps

2 answers

Your answer