Why has the batch synthesis tts word.json output changed when using <break /> tags in ssml?

Question

Why has the batch synthesis tts word.json output changed when using <break /> tags in ssml?

swal 0

A month ago I was using the batch synthesis tts api and was receiving correct responses for the word.json file. Today I seem to be receiving different responses for the word.json

I haven't changed my code at all.

The audio output is correct.

Here's my ssml input

<speak version='1.0' xml:lang='en-US'><voice name='en-US-AvaNeural'>First paragraph<break strength="strong" />this is a paragraph<break strength="strong" />this is another paragraph<break strength="strong" /></speak>

Here's the word.json output

[
  {
    "Text": "First",
    "AudioOffset": 50,
    "Duration": 400
  },
  {
    "Text": "paragraphthis is a paragraphthis is another paragraph",
    "AudioOffset": 462,
    "Duration": 850
  },
  {
    "Text": "this",
    "AudioOffset": 2362,
    "Duration": 250
  },
  {
    "Text": "is",
    "AudioOffset": 2625,
    "Duration": 100
  },
  {
    "Text": "a",
    "AudioOffset": 2737,
    "Duration": 62
  },
  {
    "Text": "paragraphthis is another paragraph",
    "AudioOffset": 2812,
    "Duration": 900
  },
  {
    "Text": "this",
    "AudioOffset": 4712,
    "Duration": 275
  },
  {
    "Text": "is",
    "AudioOffset": 5000,
    "Duration": 87
  },
  {
    "Text": "another",
    "AudioOffset": 5100,
    "Duration": 325
  },
  {
    "Text": "paragraph",
    "AudioOffset": 5437,
    "Duration": 875
  }
]

As you can see in the output, the text is repeated where the <break /> tag is used.

Saideep Anchuri 4,940 Reputation points Microsoft External Staff

2025-01-10T04:52:49.1+00:00

Hi swal

Welcome to Microsoft Q&A Forum, thank you for posting your query here!

I understand that you are encountering an issue. I am unable to send curl requests with breaks in text content but adding breaks works in Speech studio's content audio generation. Could you check in Speech studio UI once.

Kindly refer below documentation: batch-synthesis

Thank You
swal 0 Reputation points

2025-01-10T13:29:22.5033333+00:00

Hi,

I used the speech studio UI,

However, the problem is with the word boundary file, not the audio output.

It seems the speech studio UI has no option to export or display a word boundary file like the batch-synthesis api. Please let me know if i'm missing where the feature is.

Thanks
Saideep Anchuri 4,940 Reputation points Microsoft External Staff

2025-01-13T00:37:20.6466667+00:00

Hi swal

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank you.

1 answer

Your answer

Saideep Anchuri 4,940 Reputation points Microsoft External Staff

2025-01-10T04:52:49.1+00:00

Hi swal

Welcome to Microsoft Q&A Forum, thank you for posting your query here!

I understand that you are encountering an issue. I am unable to send curl requests with breaks in text content but adding breaks works in Speech studio's content audio generation. Could you check in Speech studio UI once.

Kindly refer below documentation: batch-synthesis

Thank You
swal 0 Reputation points

2025-01-10T13:29:22.5033333+00:00

Hi,

I used the speech studio UI,

However, the problem is with the word boundary file, not the audio output.

It seems the speech studio UI has no option to export or display a word boundary file like the batch-synthesis api. Please let me know if i'm missing where the feature is.

Thanks
Saideep Anchuri 4,940 Reputation points Microsoft External Staff

2025-01-13T00:37:20.6466667+00:00

Hi swal

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank you.

Answer 1

Hi swal

Here is the update. I am able to get expected answer with below inputs and commands.

 input - "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">The rainbow has<break strength=\"strong\"/>seven colors.<break strength=\"strong\"/>Each color has its own beauty.<break strength=\"strong\"/></voice></speak>"

curl -v -X PUT -H "Ocp-Apim-Subscription-Key: yoursubkey" -H "Content-Type: application/json" -d '{
    "description": "my ssml test",
    "inputKind": "SSML",
    "inputs": [
        {
            "content": "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">The rainbow has<break strength=\"strong\"/>seven colors.<break strength=\"strong\"/>Each color has its own beauty.<break strength=\"strong\"/></voice></speak>"
        }
    ],
    "properties": {
        "outputFormat": "riff-24khz-16bit-mono-pcm",
        "wordBoundaryEnabled": true,
        "sentenceBoundaryEnabled": false,
        "concatenateResult": false,
        "decompressOutputFiles": false
    }
}'
https://northeurope.api.cognitive.microsoft.com/texttospeech/batchsyntheses/idm0756?api-version=2024-04-01%22

output- [
  {
    "Text": "The",
    "AudioOffset": 50,
    "Duration": 137
  },
  {
    "Text": "rainbow",
    "AudioOffset": 200,
    "Duration": 350
  },
  {
    "Text": "has",
    "AudioOffset": 562,
    "Duration": 475
  },
  {
    "Text": "seven",
    "AudioOffset": 2050,
    "Duration": 362
  },
  {
    "Text": "colors",
    "AudioOffset": 2425,
    "Duration": 612
  },
  {
    "Text": ".",
    "AudioOffset": 3050,
    "Duration": 100
  },
  {
    "Text": "Each",
    "AudioOffset": 4900,
    "Duration": 287
  },
  {
    "Text": "color",
    "AudioOffset": 5200,
    "Duration": 350
  },
  {
    "Text": "has",
    "AudioOffset": 5562,
    "Duration": 175
  },
  {
    "Text": "its",
    "AudioOffset": 5750,
    "Duration": 150
  },
  {
    "Text": "own",
    "AudioOffset": 5912,
    "Duration": 162
  },
  {
    "Text": "beauty",
    "AudioOffset": 6087,
    "Duration": 462
  },
  {
    "Text": ".",
    "AudioOffset": 6562,
    "Duration": 100
  }
]

Thank You.

Share via

Why has the batch synthesis tts word.json output changed when using <break /> tags in ssml?

1 answer

Your answer