Japanese Voice-to-Text: Preventing Unwanted Kanji Transcription for Names

Thierry Tropée 20 Reputation points
2025-02-14T06:06:11.7233333+00:00

When using Azure Speech to Text for batch transcription of conversations in Japanese, there is an issue with person names being transcribed into incorrect Kanji characters. A custom speech model has been created to handle specific industry terms, but names continue to be converted into unwanted Kanji, which complicates transcript searches. Is there a way to prevent names (like ドイ) from being transcribed into Kanji characters (such as 土井 or 土居)?

Is it possible to train a speech model to address this issue? If so, what steps or examples should be followed to achieve this?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,924 questions
{count} votes

Accepted answer
  1. Pavankumar Purilla 3,410 Reputation points Microsoft Vendor
    2025-02-15T00:22:56.1866667+00:00

    Hi Thierry Tropée,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    To prevent unwanted Kanji transcription of names when using Azure Speech-to-Text for Japanese batch transcription, you can leverage a combination of custom pronunciation dictionaries, custom speech models, and post-processing techniques.
    Initially creating a custom pronunciation dictionary allows you to specify how names should be transcribed, ensuring that names like "ドイ" remain in Katakana rather than being converted into Kanji such as "土井" or "土居." This can be done in Azure Speech Studio by adding pronunciation rules that explicitly map names to their desired form.

    If pronunciation dictionaries alone do not resolve the issue, training a custom speech model using labeled audio data where names are consistently written in Katakana can help. This involves collecting training data, uploading correctly transcribed text, and training a model to reinforce the desired transcription format.
    I hope this information helps.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.