Why i get wrong visemes when using German with English phrases?

Question

I am using "Azure Speech" to synthesize speech from a text input, and also to generate Viseme. When using German language, if i use English phrase it sends me back wrong visemes. Ts is not good, last viseme has ts: 0, which should not happen. You can test it out if you set to German and use this sentence:

Hallo, Ich bin der neue virtuelle Assistent der Fresh Food and Beverage Group. Es freut mich, euch hier begrüssen zu dürfen. In Zukunft werde ich verschiedene Aktivitäten übernehmen dürfen. Insbesondere im Bereich Schulung und Qualitätssicherung.

If "Fresh Food and Beverage Group" is removed, it works fine.

So after English phrase, visemes are broken. Screenshot 2024-12-18 124822

Accepted Answer

Hello Veljko Markovic | Babylon Engineer,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to resolve the issue of getting wrong visemes when using German with English phrases.

Since you need to keep "Fresh Food and Beverage Group" in the text, here are a few specific suggestions to address the viseme issue:

Use SSML to explicitly mark the English phrase. This can help the speech synthesis engine handle the language switch more accurately. For an example:


       Hallo, Ich bin der neue virtuelle Assistent der Fresh Food and Beverage Group. Es freut mich, euch hier begrüssen zu dürfen. In Zukunft werde ich verschiedene Aktivitäten übernehmen dürfen. Insbesondere im Bereich Schulung und Qualitätssicherung.

Other things you can do are to:

Try breaking the text into smaller segments and process them separately. This might help in isolating the issue.
Define a custom pronunciation for the English phrase within the SSML tags. This can sometimes help in generating more accurate visemes.
If the issue persists, contacting Azure support with your specific use case and the issues.

Regarding your clarification:

Since the input is from customer, preprocess the input programmatically to dynamically detect language changes will ensures that language switches are handled dynamically, improving viseme accuracy. You can use Azure Language Detection API (part of Azure Cognitive Services) to identify segments of different languages in the text and wrap them with appropriate tags in SSML:

   def create_ssml(text, default_language="de-DE"):
       # Example of language detection logic
       detected_segments = detect_language_segments(text)  # Assume this detects and splits text by language
       ssml = f''
       for segment in detected_segments:
           if segment['language'] == default_language:
               ssml += segment['text']
           else:
               ssml += f'{segment["text"]}'
       ssml += ''
       return ssml

Secondly, you can use a custom approach for viseme generation as a workaround if SDK issue is not resolve: Break the text into smaller segments, process them individually, and stitch the viseme timelines together. For an example:

     def process_text_segments(text, language="de-DE"):
         segments = detect_language_segments(text)  # Detect language and split text
         viseme_data = []
         for segment in segments:
             response = synthesize_speech(segment['text'], language=segment['language'])
             viseme_data.extend(response['visemes'])
         return viseme_data

So, other things you can do:

a. If the German model has persistent issues, explore alternative voices or models within Azure Speech that might handle mixed-language inputs better.

b. Report this German-English viseme inconsistency to Azure support with the following details:

Provide examples of problematic and non-problematic text inputs.
Include SSML scripts and their outputs for German, Hungarian, and English cases.
Request a fix or clarification on handling mixed-language visemes for German.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Why i get wrong visemes when using German with English phrases?

0 additional answers

Your answer