When using batch speech transscription the ITN feature only applies to the first option of the nBest results.

Julian Kopka Heerup 0 Reputation points
2024-10-28T19:17:58.12+00:00

When using batch transscription the ITN feature only applies to the first option of the nBest results, whitch is not necessarily the one with the highest confidence.

The batch transscription service returns a json result with the following structure (anonymized)

{

"source": "{a url}",

"timestamp": "2024-10-09T10:49:38Z",

"durationInTicks": 4897800000,

"duration": "PT8M9.78S",

"combinedRecognizedPhrases": [

{

  "channel": 1,

  "lexical": "{content}",

  "itn": "{content - itn works}",

  "maskedITN": "{content - itn works}",

  "display": "{content - itn works}"

}

],

"recognizedPhrases": [

{

  "recognitionStatus": "Success",

  "channel": 1,

  "offset": "PT0.77S",

  "duration": "PT1.48S",

  "offsetInTicks": 7700000.0,

  "durationInTicks": 14800000.0,

  "nBest": [

    {

      "confidence": 0.44051075,

      "lexical": "{content}",

      "itn": "{content - itn works}",

      "maskedITN": "{content- itn works}",

      "display": "{content- itn works}"

    },

    {

      "confidence": 0.52692604,

      "lexical": "{content}",

      "itn": "{content - no itn}",

      "maskedITN": "{content - no itn}",

      "display": "{content - no itn}"

    }

  ],

  "locale": "da-DK"

}

]

}

Am I doing something wrong?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,772 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,011 Reputation points
    2024-10-28T21:38:20.2333333+00:00

    Hello Julian Kopka Heerup,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issues where the Inverse Text Normalization (ITN) feature is only being applied to the first option in the nBest results, with regardless of its confidence score.

    Regarding your explanation and the code provided, to ensure that the Inverse Text Normalization (ITN) feature is applied to the most accurate transcription result, you can implement a post-processing step in your application. This involves parsing the JSON response from the batch transcription service, identifying the nBest result with the highest confidence score, and then applying ITN to that result. I put together an example in Python here from your JSON:

    import json
    # Sample JSON response
    response = '''{
        "recognizedPhrases": [
            {
                "nBest": [
                    {
                        "confidence": 0.44051075,
                        "lexical": "content",
                        "itn": "content - itn works",
                        "maskedITN": "content- itn works",
                        "display": "content- itn works"
                    },
                    {
                        "confidence": 0.52692604,
                        "lexical": "content",
                        "itn": "content - no itn",
                        "maskedITN": "content - no itn",
                        "display": "content - no itn"
                    }
                ]
            }
        ]
    }'''
    # Parse the JSON response
    data = json.loads(response)
    # Find the nBest result with the highest confidence
    best_result = max(data["recognizedPhrases"][0]["nBest"], key=lambda x: x["confidence"])
    # Apply ITN to the best result if needed
    best_result_itn = best_result.get("itn", best_result["lexical"])
    print("Best result with ITN:", best_result_itn)
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.