Azure Pronunciation Assessment API – Inconsistent Scores Between Python and TypeScript

Waleed 0 Reputation points
2025-03-03T21:25:36.2933333+00:00

Hi Azure Team,

We are experiencing a significant discrepancy in pronunciation assessment scores when using the Azure Speech Pronunciation Assessment API with the Python SDK versus the TypeScript SDK.

The same audio file and reference text produce good scores in Python but poor scores in TypeScript, even though both implementations use the same configuration.

SDKs Used:

  • Python: azure.cognitiveservices.speech
  • TypeScript: microsoft-cognitiveservices-speech-sdk

Results for the Same Input:

  • Python: AccuracyScore 91.0, FluencyScore 83.0, CompletenessScore 90.0, PronScore 86.0
  • TypeScript: AccuracyScore 44, FluencyScore 80, CompletenessScore 20, PronScore 36.8Hi Azure Team, We are experiencing a significant discrepancy in pronunciation assessment scores when using the Azure Speech Pronunciation Assessment API with the Python SDK versus the TypeScript SDK. The same audio file and reference text produce good scores in Python but poor scores in TypeScript, even though both implementations use the same configuration. SDKs Used:
    • Python: azure.cognitiveservices.speech
    • TypeScript: microsoft-cognitiveservices-speech-sdk
    Results for the Same Input:
    • Python: AccuracyScore 91.0, FluencyScore 83.0, CompletenessScore 90.0, PronScore 86.0
    • TypeScript: AccuracyScore 44, FluencyScore 80, CompletenessScore 20, PronScore 36.8
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,940 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Prashanth Veeragoni 1,190 Reputation points Microsoft External Staff
    2025-03-04T04:17:25.8966667+00:00

    Hi Waleed,

    Welcome to Microsoft Q&A forum. Thank you for posting your query.

    I understood that your issue involves inconsistent pronunciation assessment scores when using Azure Speech Pronunciation Assessment API with different SDKs:

    Python SDK (azure.cognitiveservices.speech)

    TypeScript SDK (microsoft-cognitiveservices-speech-sdk)

    For the same audio file and the same reference text, you are getting significantly different results

    The FluencyScore is similar, but other metrics (Accuracy, Completeness, Pronunciation) show extreme discrepancies.

    Possible Causes & Solutions:

    Check Pronunciation Assessment Configuration

    Ensure that the parameters used in both SDKs are identical.

    Key parameters to verify:

    Grading System (100-point scale vs. 5-point scale)

    Phoneme-level vs. Word-level assessment

    Enable miscue analysis (missing words detection)

    Granularity (Phoneme, Word, FullText)

    Solution: Print/log the exact JSON payload sent in both Python and TypeScript to compare configurations.

    Audio File Encoding Issues

    Ensure that the audio file is properly formatted before being sent to the API.

    The API expects a specific format (e.g., 16-bit PCM, 16kHz, mono).

    TypeScript might be processing or encoding the audio differently.

    Solution: Convert the audio to a standard format before sending it. Compare the byte size of the file when loaded in both languages.

    SDK Version Differences

    Different versions of SDKs might implement different scoring models.

    Check if your Python and TypeScript SDKs are updated to the latest version.

    Solution: Log and compare the locale configuration in both implementations.

    Check if TypeScript is sending extra silence/noise.

    Match the pronunciation models (en-US, en-GB).

    Hope this helps. Do let us know if you any further queries.   

    ------------- 

    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.