How to get omissions and insertions from scripted assessments in streaming mode

Question

Hi, I'm trying to use Pronunciation Assessment to assess audio files that are between 60 and 120 seconds.

This is in csharp dotnet, with the Azure Speech Services SDK and en-GB.

Because of the time limit on the non-streaming method (RecongiseOnceAsync()) I'm using the streaming mode. However, the streaming mode does not appear to recognise omissions or insertions. In addition to this, despite being provided with identical configuration to the RecogniseOnce() method, the streaming method returns insertions as an error of None.

I'm looking at the documentation here and it states that all that is needed for the error type to be calculated correctly is to set EnableMiscue to true in the pronunciation config, and set the reference text when starting the assessment:

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-csharp#scripted-assessment-results

For me this seems to work fine with RecongizeOnceAsync() but whenever I use StartContinuousRecognitionAsync() (with identical config) the events received do not include the omissions and insertions.

A side by side comparison of the output of the two methods is below for an identical file and reference text. Streaming is on the left hand side, RecognizeOnce is on the right. Can anyone please help confirm how to obtain omissions and insertions using ContinuiousRecognition ?

User's image

Many thanks, Nick

Answer

This explains the odd behaviour I ran into recently. So, I have the same problem. Hopefully someone at MS can help 🙏🏽

Share via

How to get omissions and insertions from scripted assessments in streaming mode

1 answer

Your answer