Azure Pronunciation Assessment API – Inconsistent Scores Between Python and TypeScript

Question

Hi Azure Team,

We are experiencing a significant discrepancy in pronunciation assessment scores when using the Azure Speech Pronunciation Assessment API with the Python SDK versus the TypeScript SDK.

The same audio file and reference text produce good scores in Python but poor scores in TypeScript, even though both implementations use the same configuration.

SDKs Used:

Python: azure.cognitiveservices.speech
TypeScript: microsoft-cognitiveservices-speech-sdk

Results for the Same Input:

Python: AccuracyScore 91.0, FluencyScore 83.0, CompletenessScore 90.0, PronScore 86.0
TypeScript: AccuracyScore 44, FluencyScore 80, CompletenessScore 20, PronScore 36.8Hi Azure Team, We are experiencing a significant discrepancy in pronunciation assessment scores when using the Azure Speech Pronunciation Assessment API with the Python SDK versus the TypeScript SDK. The same audio file and reference text produce good scores in Python but poor scores in TypeScript, even though both implementations use the same configuration. SDKs Used:
- Python: azure.cognitiveservices.speech
- TypeScript: microsoft-cognitiveservices-speech-sdk
Results for the Same Input:
- Python: AccuracyScore 91.0, FluencyScore 83.0, CompletenessScore 90.0, PronScore 86.0
- TypeScript: AccuracyScore 44, FluencyScore 80, CompletenessScore 20, PronScore 36.8

Answer

Hi Waleed,

Welcome to Microsoft Q&A forum. Thank you for posting your query.

I understood that your issue involves inconsistent pronunciation assessment scores when using Azure Speech Pronunciation Assessment API with different SDKs:

Python SDK (azure.cognitiveservices.speech)

TypeScript SDK (microsoft-cognitiveservices-speech-sdk)

For the same audio file and the same reference text, you are getting significantly different results

The FluencyScore is similar, but other metrics (Accuracy, Completeness, Pronunciation) show extreme discrepancies.

Possible Causes & Solutions:

Check Pronunciation Assessment Configuration

Ensure that the parameters used in both SDKs are identical.

Key parameters to verify:

Grading System (100-point scale vs. 5-point scale)

Phoneme-level vs. Word-level assessment

Enable miscue analysis (missing words detection)

Granularity (Phoneme, Word, FullText)

Solution: Print/log the exact JSON payload sent in both Python and TypeScript to compare configurations.

Audio File Encoding Issues

Ensure that the audio file is properly formatted before being sent to the API.

The API expects a specific format (e.g., 16-bit PCM, 16kHz, mono).

TypeScript might be processing or encoding the audio differently.

Solution: Convert the audio to a standard format before sending it. Compare the byte size of the file when loaded in both languages.

SDK Version Differences

Different versions of SDKs might implement different scoring models.

Check if your Python and TypeScript SDKs are updated to the latest version.

Solution: Log and compare the locale configuration in both implementations.

Check if TypeScript is sending extra silence/noise.

Match the pronunciation models (en-US, en-GB).

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Share via

Azure Pronunciation Assessment API – Inconsistent Scores Between Python and TypeScript

1 answer

Your answer