你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

如何使用文本转语音虚拟形象的批量合成

项目
01/13/2025

使用文本转语音虚拟形象的批量合成 API，可将文本作为视频文件异步合成为会说话的虚拟形象。发布者和视频内容平台可以利用此 API 批量创作虚拟形象视频内容。这种方法适用于多种用例，例如培训材料、演示文稿或广告。

系统收到文本输入后，将会异步生成合成虚拟形象视频。可以在批处理模式合成中下载生成的视频输出。提交要合成的文本，轮询合成状态，并在状态指示成功时下载视频输出。文本输入格式必须是纯文本或语音合成标记语言 (SSML) 文本。

此图高度概括了该工作流。

可以使用以下 REST API 操作进行批量合成。

操作	方法	REST API 调用
创建批量合成	PUT	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01
获取批量合成	GET	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01
列出批量合成	GET	avatar/batchsyntheses/?api-version=2024-08-01
删除批量合成	DELETE	avatar/batchsyntheses/{SynthesisId}?api-version=2024-08-01

可以参考 GitHub 上托管的代码示例。

创建批量合成请求

创建新的批量合成作业时，需要 JSON 格式的某些属性。其他属性为可选。批量合成响应还包括其他属性，用于提供有关合成状态和结果的信息。例如，outputs.result 属性包含从中下载包含虚拟形象视频的视频文件的位置。从 outputs.summary 可以访问摘要和调试详细信息。

若要提交批处理合成请求，请按照以下说明构造 HTTP POST 请求正文：

设置所需的 inputKind 属性。
如果 inputKind 属性设置为“PlainText”，则还必须在 synthesisConfig 中设置 voice 属性。在下面的示例中，inputKind 设置为“SSML”，因此未设置 speechSynthesis。
设置所需的 SynthesisId 属性。为同一语音资源选择唯一的 SynthesisId。 SynthesisId 可以是包含 3～64 个字符的字符串，字符包括字母、数字、“-”、“_”，且必须以字母或数字开头和结尾。
设置所需的 talkingAvatarCharacter 和 talkingAvatarStyle 属性。可在此处找到支持的虚拟形象角色和风格。
（可选）可以设置 videoFormat、backgroundColor 和其他属性。有关详细信息，请参阅批处理合成属性。

注意

接受的最大 JSON 有效负载大小为 500 KB。

每个语音资源最多可以有 200 个并发运行的批处理合成作业。

输出视频的最大长度目前为 20 分钟，将来可能会增加。

使用 URI 格式发出 HTTP PUT 请求，如以下示例所示。将 YourSpeechKey 替换为语音资源密钥，将 YourSpeechRegion 替换为语音资源区域，并按前文所述设置请求正文属性。

curl -v -X PUT -H "Ocp-Apim-Subscription-Key: YourSpeechKey" -H "Content-Type: application/json" -d '{
    "inputKind": "SSML",
    "inputs": [
        {
         "content": "<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice name='\''en-US-AvaMultilingualNeural'\''>The rainbow has seven colors.</voice></speak>"
        }
    ],
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting"
    }
}'  "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/my-job-01?api-version=2024-08-01"

你应该会收到以下格式的响应正文：

{
    "id": "my-job-01",
    "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
    "status": "NotStarted",
    "createdDateTime": "2024-03-06T07:34:08.9487009Z",
    "lastActionDateTime": "2024-03-06T07:34:08.9487012Z",
    "inputKind": "SSML",
    "customVoices": {},
    "properties": {
        "timeToLiveInHours": 744,
    },
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting",
        "videoFormat": "Mp4",
        "videoCodec": "hevc",
        "subtitleType": "soft_embedded",
        "bitrateKbps": 2000,
        "customized": false
    }
}

status 属性应从 NotStarted 状态发展为 Running，最后变更为 Succeeded 或 Failed。可以周期性调用 GET 批批处理合成 API，直到返回状态为 Succeeded 或 Failed。

获取批处理合成

若要检索批量合成作业的状态，请使用 URI 发出 HTTP GET 请求，如以下示例所示。

将 YourSynthesisId 替换为批处理合成 ID，将 YourSpeechKey 替换为语音资源密钥，将 YourSpeechRegion 替换为语音资源区域。

curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

你应该会收到以下格式的响应正文：

{
    "id": "my-job-01",
    "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
    "status": "Succeeded",
    "createdDateTime": "2024-03-06T07:34:08.9487009Z",
    "lastActionDateTime": "2024-03-06T07:34:12.5698769",
    "inputKind": "SSML",
    "customVoices": {},
    "properties": {
        "timeToLiveInHours": 744,
        "sizeInBytes": 344460,
        "durationInMilliseconds": 2520,
        "succeededCount": 1,
        "failedCount": 0,
        "billingDetails": {
            "neuralCharacters": 29,
            "talkingAvatarDurationSeconds": 2
        }
    },
    "avatarConfig": {
        "talkingAvatarCharacter": "lisa",
        "talkingAvatarStyle": "graceful-sitting",
        "videoFormat": "Mp4",
        "videoCodec": "hevc",
        "subtitleType": "soft_embedded",
        "bitrateKbps": 2000,
        "customized": false
    },
    "outputs": {
        "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
        "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
    }
}

可以从 outputs.result 字段下载包含虚拟形象视频的视频文件。 outputs.summary 字段允许下载摘要和调试详细信息。有关批量合成结果的详细信息，请参阅批量合成结果。

列出批处理合成

若要列出语音资源的所有批处理合成作业，请使用 URI 发出 HTTP GET 请求，如以下示例所示。

将 YourSpeechKey 替换为语音资源密钥，将 YourSpeechRegion 替换为语音资源区域。（可选）可以在 URL 中设置 skip 和 top（页面大小）查询参数。 skip 的默认值为 0，maxpagesize 的默认值为 100。

curl -v -X GET "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses?skip=0&maxpagesize=2&api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

你会收到以下格式的响应正文：

{
    "value": [
        {
            "id": "my-job-02",
            "internalId": "14c25fcf-3cb6-4f46-8810-ecad06d956df",
            "status": "Succeeded",
            "createdDateTime": "2024-03-06T07:52:23.9054709Z",
            "lastActionDateTime": "2024-03-06T07:52:29.3416944",
            "inputKind": "SSML",
            "customVoices": {},
            "properties": {
                "timeToLiveInHours": 744,
                "sizeInBytes": 502676,
                "durationInMilliseconds": 2950,
                "succeededCount": 1,
                "failedCount": 0,
                "billingDetails": {
                    "neuralCharacters": 32,
                    "talkingAvatarDurationSeconds": 2
                }
            },
            "avatarConfig": {
                "talkingAvatarCharacter": "lisa",
                "talkingAvatarStyle": "casual-sitting",
                "videoFormat": "Mp4",
                "videoCodec": "h264",
                "subtitleType": "soft_embedded",
                "bitrateKbps": 2000,
                "customized": false
            },
            "outputs": {
                "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
                "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
            }
        },
        {
            "id": "my-job-01",
            "internalId": "5a25b929-1358-4e81-a036-33000e788c46",
            "status": "Succeeded",
            "createdDateTime": "2024-03-06T07:34:08.9487009Z",
            "lastActionDateTime": "2024-03-06T07:34:12.5698769",
            "inputKind": "SSML",
            "customVoices": {},
            "properties": {
                "timeToLiveInHours": 744,
                "sizeInBytes": 344460,
                "durationInMilliseconds": 2520,
                "succeededCount": 1,
                "failedCount": 0,
                "billingDetails": {
                    "neuralCharacters": 29,
                    "talkingAvatarDurationSeconds": 2
                }
            },
            "avatarConfig": {
                "talkingAvatarCharacter": "lisa",
                "talkingAvatarStyle": "graceful-sitting",
                "videoFormat": "Mp4",
                "videoCodec": "hevc",
                "subtitleType": "soft_embedded",
                "bitrateKbps": 2000,
                "customized": false
            },
            "outputs": {
                "result": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/0001.mp4?SAS_Token",
                "summary": "https://stttssvcprodusw2.blob.core.windows.net/batchsynthesis-output/xxxxx/xxxxx/summary.json?SAS_Token"
            }
        }
    ],
    "nextLink": "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/?api-version=2024-08-01&skip=2&maxpagesize=2"
}

可以从 outputs.result 下载包含虚拟形象视频的视频文件。从 outputs.summary 可以访问摘要和调试详细信息。有关详细信息，请参阅批处理合成结果。

JSON 响应中的 value 属性列出了合成请求。此列表已分页，最大页大小为 100。根据需要提供 nextLink 属性以获取分页列表的下一页。

获取批量合成结果文件

获取批处理合成作业 status 为“成功”后，可以下载视频输出结果。使用获取批处理合成响应的 outputs.result 属性中的 URL。

若要获取批处理合成结果文件，请使用 URI 发出 HTTP GET 请求，如以下示例所示。将 YourOutputsResultUrl 替换为获取批处理合成响应的 outputs.result 属性中的 URL。将 YourSpeechKey 替换为语音资源密钥。

curl -v -X GET "YourOutputsResultUrl" -H "Ocp-Apim-Subscription-Key: YourSpeechKey" > output.mp4

若要获取批处理合成摘要文件，请使用 URI 发出 HTTP GET 请求，如以下示例所示。将 YourOutputsResultUrl 替换为获取批处理合成响应的 outputs.summary 属性中的 URL。将 YourSpeechKey 替换为语音资源密钥。

curl -v -X GET "YourOutputsSummaryUrl" -H "Ocp-Apim-Subscription-Key: YourSpeechKey" > summary.json

摘要文件包含每个文本输入的合成结果。下面是示例摘要 .json 文件：

{
  "jobID": "5a25b929-1358-4e81-a036-33000e788c46",
  "status": "Succeeded",
  "results": [
    {
      "texts": [
        "<speak version='1.0' xml:lang='en-US'><voice name='en-US-AvaMultilingualNeural'>The rainbow has seven colors.</voice></speak>"
      ],
      "status": "Succeeded",
      "videoFileName": "244a87c294b94ddeb3dbaccee8ffa7eb/5a25b929-1358-4e81-a036-33000e788c46/0001.mp4",
      "TalkingAvatarCharacter": "lisa",
      "TalkingAvatarStyle": "graceful-sitting"
    }
  ]
}

删除批处理合成

检索音频输出结果且不再需要批量合成作业历史记录后，可以将其删除。语音服务保留每个合成历史记录最多 31 天，或保留请求 timeToLiveInHours 属性指定的持续时间，以较早者为准。对于状态为“成功”或“失败”的合成作业，自动删除的日期和时间计算为 lastActionDateTime 和 timeToLive 属性的总和。

若要删除批量合成作业，请使用以下 URI 格式发出 HTTP DELETE 请求。将 YourSynthesisId 替换为批处理合成 ID，将 YourSpeechKey 替换为语音资源密钥，将 YourSpeechRegion 替换为语音资源区域。

curl -v -X DELETE "https://YourSpeechRegion.api.cognitive.microsoft.com/avatar/batchsyntheses/YourSynthesisId?api-version=2024-08-01" -H "Ocp-Apim-Subscription-Key: YourSpeechKey"

如果删除请求成功，则响应头包含 HTTP/1.1 204 No Content。

通过