Speech SDK How to get proper intonation when synthesizing a partial sentance.

Matt Ma 0 Reputation points
2024-02-19T10:28:19.4766667+00:00

Hi, I'm working on reducing, to the absolute minimum possible, the round-trip latency in a voice bot project that uses an LLM and TTS . I have a number of strategies and one is send the first few words produced by the LLM for synthesis while the LLM autoregressively emits the subsequent words. I then cache the TTS results (audio+ visemes) for later use. The problem is that those initial few words appear to get treated as an entire sentence and so TTS produces falling intonation at the end of the audio as if where encountering a full stop. I want to find some way to prevent the falling intonation. Are there any ways to control this (ssml or special characters..) or perhaps some work around like generate audio for an initial segment that is longer by one word, say, than then truncate the audio? You guidance is greatly appreciated

Thanks Matt

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,802 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,741 Reputation points
    2024-02-26T15:21:42.75+00:00

    Thanks for the details, In SSML, the <prosody> tag can be used to control the volume, speaking rate, and pitch of the speech. This might help in controlling the tone and speed of the speech.

    Another way to control the pauses and tone is by using the <break> tag. You can set a pause based on strength (equivalent to the pause after a comma, a sentence, or a paragraph), or you can set it to a specific length of time in seconds or milliseconds. Here are the strength attribute values:

    • none: No pause. Use none to remove a normally occurring pause, such as after a period.
    • x-weak: Has the same strength as none, no pause.
    • weak: Sets a pause of the same duration as the pause after a comma.
    • medium: Has the same strength as weak.
    • strong: Sets a pause of the same duration as the pause after a sentence.
    • x-strong: Sets a pause of the same duration as the pause after a paragraph.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.