認識アクションを使用してユーザーによる入力を収集する

[アーティクル]
09/26/2024

このガイドは、Azure Communication Services Call Automation SDK を使用して参加者が提供する DTMF 入力の認識を開始するのに役立ちます。

前提条件

アクティブなサブスクリプションがある Azure アカウント。詳しくは、アカウントの無料作成に関するページをご覧ください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
お使いのオペレーティングシステムに対応した最新の .NET ライブラリ。
最新の NuGet パッケージ。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
`Prompt` (詳しくは、「再生アクションを使用してユーザーへの音声プロンプトをカスタマイズする」をご覧ください)	FileSource、TextSource	設定なし	入力を認識する前に再生するメッセージ。	省略可能
`InterToneTimeout`	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	省略可能
`InitialSegmentationSilenceTimeoutInSeconds`	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。「音声を認識する方法」をご覧ください。	省略可能
`RecognizeInputsType`	列挙型	dtmf	認識される入力の種類。オプションは、`dtmf`、`choices`、`speech`、`speechordtmf` です。	必須
`InitialSilenceTimeout`	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。「音声を認識する方法」をご覧ください。	省略可能
`MaxTonesToCollect`	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
`StopTones`	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	省略可能
`InterruptPrompt`	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	省略可能
`InterruptCallMediaOperation`	Bool	True	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	省略可能
`OperationContext`	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	省略可能
`Phrases`	String	設定なし	ラベルに関連付ける語句のリスト。これらの語句のいずれかが聞こえたら、正常に認識されます。	必須
`Tone`	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	省略可能
`Label`	String	設定なし	認識のキー値。	必須
`Language`	String	En-us	音声の認識に使われる言語。	省略可能
`EndSilenceTimeout`	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

DTMF と音声の両方が recognizeInputsType である状況では、認識アクションは、受信した最初の入力の種類に対して行われます。たとえば、ユーザーが最初にキーパッドの数字を押した場合、認識アクションはそれを DTMF イベントと見なし、DTMF トーンの聞き取りを続けます。ユーザーが最初に声を出した場合、認識アクションはそれを音声認識イベントと見なし、音声入力を聞き取ります。

新しい C# アプリケーションを作成する

オペレーティングシステムのコンソールウィンドウで、次の dotnet コマンドを使用して、新しい Web アプリケーションを作成します。

dotnet new web -n MyApplication

NuGet パッケージのインストール

NuGet Gallery | Azure.Communication.CallAutomation から NuGet パッケージを入手します。手順に従ってパッケージをインストールします。

通話を確立する

この段階では、通話を始めることに慣れているはずです。通話の実行について詳しくは、発信通話の実行に関するクイックスタートをご覧ください。また、ここで提供されているコードスニペットを使用して、呼び出しに応答する方法を理解することもできます。

var callAutomationClient = new CallAutomationClient("<Azure Communication Services connection string>");

var answerCallOptions = new AnswerCallOptions("<Incoming call context once call is connected>", new Uri("<https://sample-callback-uri>"))  
{  
    CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri("<Azure Cognitive Services Endpoint>") } 
};  

var answerCallResult = await callAutomationClient.AnswerCallAsync(answerCallOptions);

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

var maxTonesToCollect = 3;
String textToPlay = "Welcome to Contoso, please enter 3 DTMF.";
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeDtmfOptions(targetParticipant, maxTonesToCollect) {
  InitialSilenceTimeout = TimeSpan.FromSeconds(30),
    Prompt = playSource,
    InterToneTimeout = TimeSpan.FromSeconds(5),
    InterruptPrompt = true,
    StopTones = new DtmfTone[] {
      DtmfTone.Pound
    },
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換フローの場合、Call Automation の認識アクションではカスタム音声モデルの使用もサポートされています。カスタム音声モデルなどの機能は、既定の音声テキスト変換モデルでは理解されない可能性がある複雑な単語を聞き取る必要があるアプリケーションを構築するときに、役に立つことがあります。たとえば、遠隔医療業界用のアプリケーションを構築しているときは、仮想エージェントが医療用語を認識できることが必要な場合があります。詳しくは、カスタム音声プロジェクトの作成に関する記事をご覧ください。

音声テキスト変換 Choices

var choices = new List < RecognitionChoice > {
  new RecognitionChoice("Confirm", new List < string > {
    "Confirm",
    "First",
    "One"
  }) {
    Tone = DtmfTone.One
  },
  new RecognitionChoice("Cancel", new List < string > {
    "Cancel",
    "Second",
    "Two"
  }) {
    Tone = DtmfTone.Two
  }
};
String textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!";

var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeChoiceOptions(targetParticipant, choices) {
  InterruptPrompt = true,
    InitialSilenceTimeout = TimeSpan.FromSeconds(30),
    Prompt = playSource,
    OperationContext = "AppointmentReminderMenu",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId"
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換

String textToPlay = "Hi, how can I help you today?";
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeSpeechOptions(targetParticipant) {
  Prompt = playSource,
    EndSilenceTimeout = TimeSpan.FromMilliseconds(1000),
    OperationContext = "OpenQuestionSpeech",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId"
};
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId)
  .GetCallMedia()
  .StartRecognizingAsync(recognizeOptions);

音声テキスト変換または DTMF

var maxTonesToCollect = 1; 
String textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOrDtmfOptions(targetParticipant, maxTonesToCollect) 
{ 
    Prompt = playSource, 
    EndSilenceTimeout = TimeSpan.FromMilliseconds(1000), 
    InitialSilenceTimeout = TimeSpan.FromSeconds(30), 
    InterruptPrompt = true, 
    OperationContext = "OpenQuestionSpeechOrDtmf",
    //Only add the SpeechModelEndpointId if you have a custom speech model you would like to use
    SpeechModelEndpointId = "YourCustomSpeechModelEndpointId" 
}; 
var recognizeResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .StartRecognizingAsync(recognizeOptions);

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

開発者は、登録した Webhook コールバックで RecognizeCompleted と RecognizeFailed イベントをサブスクライブできます。アプリケーションのビジネスロジックでこのコールバックを使って、いずれかのイベントが発生したときに次の手順を決定します。

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeCompleted recognizeCompleted) 
{ 
    switch (recognizeCompleted.RecognizeResult) 
    { 
        case DtmfResult dtmfResult: 
            //Take action for Recognition through DTMF 
            var tones = dtmfResult.Tones; 
            logger.LogInformation("Recognize completed succesfully, tones={tones}", tones); 
            break; 
        case ChoiceResult choiceResult: 
            // Take action for Recognition through Choices 
            var labelDetected = choiceResult.Label; 
            var phraseDetected = choiceResult.RecognizedPhrase; 
            // If choice is detected by phrase, choiceResult.RecognizedPhrase will have the phrase detected, 
            // If choice is detected using dtmf tone, phrase will be null 
            logger.LogInformation("Recognize completed succesfully, labelDetected={labelDetected}, phraseDetected={phraseDetected}", labelDetected, phraseDetected);
            break; 
        case SpeechResult speechResult: 
            // Take action for Recognition through Choices 
            var text = speechResult.Speech; 
            logger.LogInformation("Recognize completed succesfully, text={text}", text); 
            break; 
        default: 
            logger.LogInformation("Recognize completed succesfully, recognizeResult={recognizeResult}", recognizeCompleted.RecognizeResult); 
            break; 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeFailed recognizeFailed) 
{ 
    if (MediaEventReasonCode.RecognizeInitialSilenceTimedOut.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for time out 
        logger.LogInformation("Recognition failed: initial silencev time out"); 
    } 
    else if (MediaEventReasonCode.RecognizeSpeechOptionNotMatched.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for option not matched 
        logger.LogInformation("Recognition failed: speech option not matched"); 
    } 
    else if (MediaEventReasonCode.RecognizeIncorrectToneDetected.Equals(recognizeFailed.ReasonCode)) 
    { 
        // Take action for incorrect tone 
        logger.LogInformation("Recognition failed: incorrect tone detected"); 
    } 
    else 
    { 
        logger.LogInformation("Recognition failed, result={result}, context={context}", recognizeFailed.ResultInformation?.Message, recognizeFailed.OperationContext); 
    } 
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (acsEvent is RecognizeCanceled { OperationContext: "AppointmentReminderMenu" })
        {
            logger.LogInformation($"RecognizeCanceled event received for call connection id: {@event.CallConnectionId}");
            //Take action on recognize canceled operation
           await callConnection.HangUpAsync(forEveryone: true);
        }

前提条件

アクティブなサブスクリプションがある Azure アカウント。詳しくは、アカウントの無料作成に関するページをご覧ください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Java Development Kit バージョン 8 以降。
Apache Maven。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
`Prompt` (詳しくは、「再生アクションを使用してユーザーへの音声プロンプトをカスタマイズする」をご覧ください)	FileSource、TextSource	設定なし	入力を認識する前に再生するメッセージ。	省略可能
`InterToneTimeout`	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	省略可能
`InitialSegmentationSilenceTimeoutInSeconds`	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。「音声を認識する方法」をご覧ください。	省略可能
`RecognizeInputsType`	列挙型	dtmf	認識される入力の種類。オプションは、`dtmf`、`choices`、`speech`、`speechordtmf` です。	必須
`InitialSilenceTimeout`	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。「音声を認識する方法」をご覧ください。	省略可能
`MaxTonesToCollect`	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
`StopTones`	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	省略可能
`InterruptPrompt`	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	省略可能
`InterruptCallMediaOperation`	Bool	True	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	省略可能
`OperationContext`	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	省略可能
`Phrases`	String	設定なし	ラベルに関連付ける語句のリスト。これらの語句のいずれかが聞こえたら、正常に認識されます。	必須
`Tone`	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	省略可能
`Label`	String	設定なし	認識のキー値。	必須
`Language`	String	En-us	音声の認識に使われる言語。	省略可能
`EndSilenceTimeout`	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい Java アプリケーションを作成する

ターミナルまたはコマンドウィンドウで、Java アプリケーションを作成するディレクトリに移動します。 mvn コマンドを実行して、maven-archetype-quickstart テンプレートから Java プロジェクトを生成します。

mvn archetype:generate -DgroupId=com.communication.quickstart -DartifactId=communication-quickstart -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false

mvn コマンドは、artifactId 引数と同じ名前でディレクトリを作成します。 src/main/java ディレクトリには、プロジェクトのソースコードが含まれます。 src/test/java ディレクトリには、テストソースが含まれます。

generate ステップによって、artifactId と同じ名前のディレクトリが作成されたことに注意してください。 src/main/java ディレクトリには、ソースコードが含まれます。 src/test/java ディレクトリには、テストが含まれます。 pom.xml ファイルは、プロジェクトのプロジェクトオブジェクトモデル (POM) です。

Java 8 以降を使用するように、アプリケーションの POM ファイルを更新します。

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>

パッケージ参照を追加する

POM ファイルに、プロジェクトに対する次の参照を追加します。

azure-communication-callautomation

<dependency>
  <groupId>com.azure</groupId>
  <artifactId>azure-communication-callautomation</artifactId>
  <version>1.0.0</version>
</dependency>

通話を確立する

CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint("https://sample-cognitive-service-resource.cognitiveservices.azure.com/"); 
answerCallOptions = new AnswerCallOptions("<Incoming call context>", "<https://sample-callback-uri>").setCallIntelligenceOptions(callIntelligenceOptions); 
Response < AnswerCallResult > answerCallResult = callAutomationClient
  .answerCallWithResponse(answerCallOptions)
  .block();

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

var maxTonesToCollect = 3;
String textToPlay = "Welcome to Contoso, please enter 3 DTMF.";
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural");

var recognizeOptions = new CallMediaRecognizeDtmfOptions(targetParticipant, maxTonesToCollect) 
    .setInitialSilenceTimeout(Duration.ofSeconds(30)) 
    .setPlayPrompt(playSource) 
    .setInterToneTimeout(Duration.ofSeconds(5)) 
    .setInterruptPrompt(true) 
    .setStopTones(Arrays.asList(DtmfTone.POUND));

var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block(); 

log.info("Start recognizing result: " + recognizeResponse.getStatusCode());

音声テキスト変換 Choices

var choices = Arrays.asList(
  new RecognitionChoice()
  .setLabel("Confirm")
  .setPhrases(Arrays.asList("Confirm", "First", "One"))
  .setTone(DtmfTone.ONE),
  new RecognitionChoice()
  .setLabel("Cancel")
  .setPhrases(Arrays.asList("Cancel", "Second", "Two"))
  .setTone(DtmfTone.TWO)
);

String textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!";
var playSource = new TextSource()
  .setText(textToPlay)
  .setVoiceName("en-US-ElizabethNeural");
var recognizeOptions = new CallMediaRecognizeChoiceOptions(targetParticipant, choices)
  .setInterruptPrompt(true)
  .setInitialSilenceTimeout(Duration.ofSeconds(30))
  .setPlayPrompt(playSource)
  .setOperationContext("AppointmentReminderMenu")
  //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
  .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID"); 
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId)
  .getCallMediaAsync()
  .startRecognizingWithResponse(recognizeOptions)
  .block();

音声テキスト変換

String textToPlay = "Hi, how can I help you today?"; 
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOptions(targetParticipant, Duration.ofMillis(1000)) 
    .setPlayPrompt(playSource) 
    .setOperationContext("OpenQuestionSpeech")
    //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
    .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");  
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block();

音声テキスト変換または DTMF

var maxTonesToCollect = 1; 
String textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural"); 
var recognizeOptions = new CallMediaRecognizeSpeechOrDtmfOptions(targetParticipant, maxTonesToCollect, Duration.ofMillis(1000)) 
    .setPlayPrompt(playSource) 
    .setInitialSilenceTimeout(Duration.ofSeconds(30)) 
    .setInterruptPrompt(true) 
    .setOperationContext("OpenQuestionSpeechOrDtmf")
    //Only add the SpeechRecognitionModelEndpointId if you have a custom speech model you would like to use
    .setSpeechRecognitionModelEndpointId("YourCustomSpeechModelEndpointID");  
var recognizeResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .startRecognizingWithResponse(recognizeOptions) 
    .block();

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeCompleted) { 
    RecognizeCompleted event = (RecognizeCompleted) acsEvent; 
    RecognizeResult recognizeResult = event.getRecognizeResult().get(); 
    if (recognizeResult instanceof DtmfResult) { 
        // Take action on collect tones 
        DtmfResult dtmfResult = (DtmfResult) recognizeResult; 
        List<DtmfTone> tones = dtmfResult.getTones(); 
        log.info("Recognition completed, tones=" + tones + ", context=" + event.getOperationContext()); 
    } else if (recognizeResult instanceof ChoiceResult) { 
        ChoiceResult collectChoiceResult = (ChoiceResult) recognizeResult; 
        String labelDetected = collectChoiceResult.getLabel(); 
        String phraseDetected = collectChoiceResult.getRecognizedPhrase(); 
        log.info("Recognition completed, labelDetected=" + labelDetected + ", phraseDetected=" + phraseDetected + ", context=" + event.getOperationContext()); 
    } else if (recognizeResult instanceof SpeechResult) { 
        SpeechResult speechResult = (SpeechResult) recognizeResult; 
        String text = speechResult.getSpeech(); 
        log.info("Recognition completed, text=" + text + ", context=" + event.getOperationContext()); 
    } else { 
        log.info("Recognition completed, result=" + recognizeResult + ", context=" + event.getOperationContext()); 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeFailed) { 
    RecognizeFailed event = (RecognizeFailed) acsEvent; 
    if (ReasonCode.Recognize.INITIAL_SILENCE_TIMEOUT.equals(event.getReasonCode())) { 
        // Take action for time out 
        log.info("Recognition failed: initial silence time out"); 
    } else if (ReasonCode.Recognize.SPEECH_OPTION_NOT_MATCHED.equals(event.getReasonCode())) { 
        // Take action for option not matched 
        log.info("Recognition failed: speech option not matched"); 
    } else if (ReasonCode.Recognize.DMTF_OPTION_MATCHED.equals(event.getReasonCode())) { 
        // Take action for incorrect tone 
        log.info("Recognition failed: incorrect tone detected"); 
    } else { 
        log.info("Recognition failed, result=" + event.getResultInformation().getMessage() + ", context=" + event.getOperationContext()); 
    } 
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (acsEvent instanceof RecognizeCanceled) { 
    RecognizeCanceled event = (RecognizeCanceled) acsEvent; 
    log.info("Recognition canceled, context=" + event.getOperationContext()); 
}

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Node.js がインストールされている場合は、公式 Web サイトからインストールできます。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
`Prompt` (詳しくは、「再生アクションを使用してユーザーへの音声プロンプトをカスタマイズする」をご覧ください)	FileSource、TextSource	設定なし	入力を認識する前に再生するメッセージ。	省略可能
`InterToneTimeout`	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	省略可能
`InitialSegmentationSilenceTimeoutInSeconds`	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。「音声を認識する方法」をご覧ください。	省略可能
`RecognizeInputsType`	列挙型	dtmf	認識される入力の種類。オプションは、`dtmf`、`choices`、`speech`、`speechordtmf` です。	必須
`InitialSilenceTimeout`	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。「音声を認識する方法」をご覧ください。	省略可能
`MaxTonesToCollect`	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
`StopTones`	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	省略可能
`InterruptPrompt`	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	省略可能
`InterruptCallMediaOperation`	Bool	True	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	省略可能
`OperationContext`	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	省略可能
`Phrases`	String	設定なし	ラベルに関連付ける語句のリスト。これらの語句のいずれかが聞こえたら、正常に認識されます。	必須
`Tone`	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	省略可能
`Label`	String	設定なし	認識のキー値。	必須
`Language`	String	En-us	音声の認識に使われる言語。	省略可能
`EndSilenceTimeout`	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい JavaScript アプリケーションを作成する

プロジェクトディレクトリに新しい JavaScript アプリケーションを作成します。次のコマンドを使って、新しい Node.js プロジェクトを初期化します。これにより、プロジェクトの依存関係を管理する、プロジェクトの package.json ファイルが作成されます。

npm init -y

Azure Communication Services Call Automation パッケージをインストールする

npm install @azure/communication-call-automation

プロジェクトディレクトリに新しい JavaScript ファイルを作成し、たとえば app.js という名前を付けます。このファイルに JavaScript コードを記述します。

次のコマンドを使い、Node.js を使ってアプリケーションを実行します。

node app.js

通話を確立する

この段階では、通話を始めることに慣れているはずです。通話の実行について詳しくは、発信通話の実行に関するクイックスタートをご覧ください。

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

const maxTonesToCollect = 3; 
const textToPlay = "Welcome to Contoso, please enter 3 DTMF."; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeDtmfOptions = { 
    maxTonesToCollect: maxTonesToCollect, 
    initialSilenceTimeoutInSeconds: 30, 
    playPrompt: playSource, 
    interToneTimeoutInSeconds: 5, 
    interruptPrompt: true, 
    stopDtmfTones: [ DtmfTone.Pound ], 
    kind: "callMediaRecognizeDtmfOptions" 
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換 Choices

const choices = [ 
    {  
        label: "Confirm", 
        phrases: [ "Confirm", "First", "One" ], 
        tone: DtmfTone.One 
    }, 
    { 
        label: "Cancel", 
        phrases: [ "Cancel", "Second", "Two" ], 
        tone: DtmfTone.Two 
    } 
]; 

const textToPlay = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeChoiceOptions = { 
    choices: choices, 
    interruptPrompt: true, 
    initialSilenceTimeoutInSeconds: 30, 
    playPrompt: playSource, 
    operationContext: "AppointmentReminderMenu", 
    kind: "callMediaRecognizeChoiceOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換

const textToPlay = "Hi, how can I help you today?"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeSpeechOptions = { 
    endSilenceTimeoutInSeconds: 1, 
    playPrompt: playSource, 
    operationContext: "OpenQuestionSpeech", 
    kind: "callMediaRecognizeSpeechOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

音声テキスト変換または DTMF

const maxTonesToCollect = 1; 
const textToPlay = "Hi, how can I help you today, you can press 0 to speak to an agent?"; 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 
const recognizeOptions: CallMediaRecognizeSpeechOrDtmfOptions = { 
    maxTonesToCollect: maxTonesToCollect, 
    endSilenceTimeoutInSeconds: 1, 
    playPrompt: playSource, 
    initialSilenceTimeoutInSeconds: 30, 
    interruptPrompt: true, 
    operationContext: "OpenQuestionSpeechOrDtmf", 
    kind: "callMediaRecognizeSpeechOrDtmfOptions",
    //Only add the speechRecognitionModelEndpointId if you have a custom speech model you would like to use
    speechRecognitionModelEndpointId: "YourCustomSpeechEndpointId"
}; 

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .startRecognizing(targetParticipant, recognizeOptions);

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeCompleted") { 
    if (eventData.recognitionType === "dtmf") { 
        const tones = eventData.dtmfResult.tones; 
        console.log("Recognition completed, tones=%s, context=%s", tones, eventData.operationContext); 
    } else if (eventData.recognitionType === "choices") { 
        const labelDetected = eventData.choiceResult.label; 
        const phraseDetected = eventData.choiceResult.recognizedPhrase; 
        console.log("Recognition completed, labelDetected=%s, phraseDetected=%s, context=%s", labelDetected, phraseDetected, eventData.operationContext); 
    } else if (eventData.recognitionType === "speech") { 
        const text = eventData.speechResult.speech; 
        console.log("Recognition completed, text=%s, context=%s", text, eventData.operationContext); 
    } else { 
        console.log("Recognition completed: data=%s", JSON.stringify(eventData, null, 2)); 
    } 
}

RecognizeCompleted イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeFailed") {
    console.log("Recognize failed: data=%s", JSON.stringify(eventData, null, 2));
}

RecognizeCanceled イベントを逆シリアル化する方法の例:

if (event.type === "Microsoft.Communication.RecognizeCanceled") {
    console.log("Recognize canceled, context=%s", eventData.operationContext);
}

前提条件

アクティブなサブスクリプションを持つ Azure アカウント。詳細については、アカウントの無料作成に関するページを参照してください。
Azure Communication Services リソース。 Azure Communication Services リソースの作成に関する記事を参照してください。このリソースの接続文字列をメモします。
Call Automation SDK を使用して新しい Web サービスアプリケーションを作成する。
Python.org から Python をインストールします。

AI 機能のために

Azure AI サービスを作成し、Azure Communication Services リソースに接続します。
Azure AI サービスリソースのカスタムサブドメインを作成します。

技術仕様

Recognize 関数をカスタマイズするには、次のパラメーターを使用できます：

パラメーター	Type	（指定されない場合は）既定値	説明	必須または任意
`Prompt` (詳しくは、「再生アクションを使用してユーザーへの音声プロンプトをカスタマイズする」をご覧ください)	FileSource、TextSource	設定なし	入力を認識する前に再生するメッセージ。	省略可能
`InterToneTimeout`	TimeSpan	2 秒最小: 1 秒最大: 60 秒	Azure Communication Services が呼び出し元が別の桁を押すのを待機する秒数を制限します (数字間タイムアウト)。	省略可能
`InitialSegmentationSilenceTimeoutInSeconds`	Integer	0.5 秒	認識アクションがタイムアウトを考慮する前に入力を待機する時間は。「音声を認識する方法」をご覧ください。	省略可能
`RecognizeInputsType`	列挙型	dtmf	認識される入力の種類。オプションは、`dtmf`、`choices`、`speech`、`speechordtmf` です。	必須
`InitialSilenceTimeout`	TimeSpan	5 秒最小: 0 秒最大: 300 秒 (DTMF) 最大: 20 秒 (Choices) 最大: 20 秒 (Speech)	初期無音タイムアウトでは、認識試行が "一致なし" の結果で終了する前に、フレーズの "前に" 許容される無音声のオーディオの量を調整します。「音声を認識する方法」をご覧ください。	省略可能
`MaxTonesToCollect`	Integer	既定値なし最小:1	開発者が参加者からの入力として期待する桁数。	必須
`StopTones`	IEnumeration<DtmfTone>	設定なし	数字の参加者は、バッチ DTMF イベントからエスケープするために押すことができます。	省略可能
`InterruptPrompt`	Bool	正しい	参加者が数字を押して playMessage を中断する機能を持っている場合。	省略可能
`InterruptCallMediaOperation`	Bool	True	このフラグを設定すると、現在の通話メディア操作が中断されます。たとえば、オーディオが再生されている場合は、その操作が中断され、認識が開始されます。	省略可能
`OperationContext`	String	設定なし	開発者が中間アクションを渡すことができる文字列。開発者が受信したイベントに関するコンテキストを格納するのに役立ちます。	省略可能
`Phrases`	String	設定なし	ラベルに関連付ける語句のリスト。これらの語句のいずれかが聞こえたら、正常に認識されます。	必須
`Tone`	String	設定なし	ユーザーが音声を使わずに番号を押すことにした場合に認識するトーン。	省略可能
`Label`	String	設定なし	認識のキー値。	必須
`Language`	String	En-us	音声の認識に使われる言語。	省略可能
`EndSilenceTimeout`	TimeSpan	0.5 秒	音声として生成される最終的な結果の検出に使われるスピーカーの最後の一時停止。	オプション

Note

新しい Python アプリケーションを作成する

プロジェクトの Python 仮想環境を設定する

python -m venv play-audio-app

仮想環境をアクティブ化する

Windows では、次のコマンドを使います。

.\ play-audio-quickstart \Scripts\activate

Unix では、次のコマンドを使います。

source play-audio-quickstart /bin/activate

Azure Communication Services Call Automation パッケージをインストールする

pip install azure-communication-callautomation

プロジェクトディレクトリにアプリケーションファイルを作成し、たとえば app.py という名前を付けます。このファイルに Python コードを記述します。

次のコマンドを使い、Python を使ってアプリケーションを実行します。

python app.py

通話を確立する

Recognize アクションを呼び出す

アプリケーションが呼び出しに応答すると、参加者の入力の認識とプロンプトの再生に関する情報を提供できます。

DTMF

max_tones_to_collect = 3 
text_to_play = "Welcome to Contoso, please enter 3 DTMF." 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    dtmf_max_tones_to_collect=max_tones_to_collect, 
    input_type=RecognizeInputType.DTMF, 
    target_participant=target_participant, 
    initial_silence_timeout=30, 
    play_prompt=play_source, 
    dtmf_inter_tone_timeout=5, 
    interrupt_prompt=True, 
    dtmf_stop_tones=[ DtmfTone.Pound ])

音声テキスト変換 Choices

choices = [ 
    RecognitionChoice( 
        label="Confirm", 
        phrases=[ "Confirm", "First", "One" ], 
        tone=DtmfTone.ONE 
    ), 
    RecognitionChoice( 
        label="Cancel", 
        phrases=[ "Cancel", "Second", "Two" ], 
        tone=DtmfTone.TWO 
    ) 
] 
text_to_play = "Hello, This is a reminder for your appointment at 2 PM, Say Confirm to confirm your appointment or Cancel to cancel the appointment. Thank you!" 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    input_type=RecognizeInputType.CHOICES, 
    target_participant=target_participant, 
    choices=choices, 
    interrupt_prompt=True, 
    initial_silence_timeout=30, 
    play_prompt=play_source, 
    operation_context="AppointmentReminderMenu",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")

音声テキスト変換

text_to_play = "Hi, how can I help you today?" 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    input_type=RecognizeInputType.SPEECH, 
    target_participant=target_participant, 
    end_silence_timeout=1, 
    play_prompt=play_source, 
    operation_context="OpenQuestionSpeech",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")

音声テキスト変換または DTMF

max_tones_to_collect = 1 
text_to_play = "Hi, how can I help you today, you can also press 0 to speak to an agent." 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural") 
call_automation_client.get_call_connection(call_connection_id).start_recognizing_media( 
    dtmf_max_tones_to_collect=max_tones_to_collect, 
    input_type=RecognizeInputType.SPEECH_OR_DTMF, 
    target_participant=target_participant, 
    end_silence_timeout=1, 
    play_prompt=play_source, 
    initial_silence_timeout=30, 
    interrupt_prompt=True, 
    operation_context="OpenQuestionSpeechOrDtmf",
    # Only add the speech_recognition_model_endpoint_id if you have a custom speech model you would like to use
    speech_recognition_model_endpoint_id="YourCustomSpeechModelEndpointId")  
app.logger.info("Start recognizing")

Note

パラメーターが設定されていない場合、可能な場合は既定値が適用されます。

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeCompleted": 
    app.logger.info("Recognize completed: data=%s", event.data) 
    if event.data['recognitionType'] == "dtmf": 
        tones = event.data['dtmfResult']['tones'] 
        app.logger.info("Recognition completed, tones=%s, context=%s", tones, event.data.get('operationContext')) 
    elif event.data['recognitionType'] == "choices": 
        labelDetected = event.data['choiceResult']['label']; 
        phraseDetected = event.data['choiceResult']['recognizedPhrase']; 
        app.logger.info("Recognition completed, labelDetected=%s, phraseDetected=%s, context=%s", labelDetected, phraseDetected, event.data.get('operationContext')); 
    elif event.data['recognitionType'] == "speech": 
        text = event.data['speechResult']['speech']; 
        app.logger.info("Recognition completed, text=%s, context=%s", text, event.data.get('operationContext')); 
    else: 
        app.logger.info("Recognition completed: data=%s", event.data);

RecognizeCompleted イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeFailed": 
    app.logger.info("Recognize failed: data=%s", event.data);

RecognizeCanceled イベントを逆シリアル化する方法の例:

if event.type == "Microsoft.Communication.RecognizeCanceled":
    # Handle the RecognizeCanceled event according to your application logic

イベントコード

Status	コード	サブコード	メッセージ
`RecognizeCompleted`	200	8531	アクションが完了し、受信した最大桁数。
`RecognizeCompleted`	200	8514	停止トーンが検出された時点でアクションが完了しました。
`RecognizeCompleted`	400	8508	アクションが失敗し、操作が取り消されました。
`RecognizeCompleted`	400	8532	アクションが失敗し、桁間無音タイムアウトに達しました。
`RecognizeCanceled`	400	8508	アクションが失敗し、操作が取り消されました。
`RecognizeFailed`	400	8510	アクションが失敗し、初期無音タイムアウトに達しました。
`RecognizeFailed`	500	8511	アクションが失敗し、プロンプトの再生中にエラーが発生しました。
`RecognizeFailed`	500	8512	不明な内部サーバーエラー。
`RecognizeFailed`	400	8510	アクションが失敗し、初期無音タイムアウトに達しました
`RecognizeFailed`	400	8532	アクションが失敗し、桁間無音タイムアウトに達しました。
`RecognizeFailed`	400	8565	アクションが失敗しました。Azure AI サービスに対する不適切な要求が発生しました。入力パラメーターを確認してください。
`RecognizeFailed`	400	8565	アクションが失敗しました。Azure AI サービスに対する不適切な要求が発生しました。指定されたペイロードを処理できません。再生ソースの入力を確認してください。
`RecognizeFailed`	401	8565	アクションが失敗しました。Azure AI サービスの認証エラー。
`RecognizeFailed`	403	8565	アクションが失敗しました。Azure AI サービスへの要求が禁止され、要求で使用される無料サブスクリプションのクォータが不足しました。
`RecognizeFailed`	429	8565	アクションが失敗しました。要求が Azure AI サービスサブスクリプションで許可されている同時要求の数を超えました。
`RecognizeFailed`	408	8565	アクションが失敗しました。Azure AI サービスへの要求がタイムアウトしました。
`RecognizeFailed`	500	8511	アクションが失敗し、プロンプトの再生中にエラーが発生しました。
`RecognizeFailed`	500	8512	不明な内部サーバーエラー。

既知の制限事項

帯域内 DTMF はサポートされていません。代わりに RFC 2833 DTMF を使用してください。
テキスト読み上げのテキストプロンプトでサポートされる文字数は最大 400 文字です。プロンプトがこれより長い場合は、テキスト読み上げベースの再生アクションに SSML を使用することをお勧めします。
Azure Cognitive Service for Speech サービスのクォータ制限を超過する場合は、「Speech サービスのクォータと制限」で説明されている手順に従って、この制限の引き上げを要求できます。

リソースをクリーンアップする

Communication Services サブスクリプションをクリーンアップして解除する場合は、リソースまたはリソースグループを削除できます。リソースグループを削除すると、それに関連付けられている他のリソースも削除されます。詳細については、リソースのクリーンアップに関する記事を参照してください。

次のステップ

ユーザー入力の収集に関する詳細情報
通話でのオーディオの再生に関する詳細情報
Call Automation の詳細を確認する

次の方法で共有

認識アクションを使用してユーザーによる入力を収集する

前提条件

AI 機能のために

技術仕様

新しい C# アプリケーションを作成する

NuGet パッケージのインストール

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい Java アプリケーションを作成する

パッケージ参照を追加する

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい JavaScript アプリケーションを作成する

Azure Communication Services Call Automation パッケージをインストールする

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

前提条件

AI 機能のために

技術仕様

新しい Python アプリケーションを作成する

プロジェクトの Python 仮想環境を設定する

仮想環境をアクティブ化する

Azure Communication Services Call Automation パッケージをインストールする

通話を確立する

Recognize アクション を呼び出す

DTMF

音声テキスト変換 Choices

音声テキスト変換

音声テキスト変換または DTMF

イベントの更新の認識の受信

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCompleted イベントを逆シリアル化する方法の例:

RecognizeCanceled イベントを逆シリアル化する方法の例:

イベント コード

既知の制限事項

リソースをクリーンアップする

次のステップ

フィードバック

その他のリソース

Recognize アクションを呼び出す

Recognize アクションを呼び出す

Recognize アクションを呼び出す

Recognize アクションを呼び出す

イベントコード