快速入門：建立自訂語音助理

發行項
09/04/2024

在本快速入門中，您將使用語音 SDK 建立自訂語音助理應用程式，以連線至您已撰寫並設定的 Bot。如果您需要建立 Bot，請參閱相關的教學課程，以取得更完整的指南。

符合幾項必要條件後，只需執行幾個步驟即可連線至您的自訂語音助理：

從您的訂用帳戶金鑰和區域建立 BotFrameworkConfig 物件。
使用上面的 BotFrameworkConfig 物件來建立 DialogServiceConnector 物件。
使用 DialogServiceConnector 物件，開始對單一語句進行聽取程序。
檢查所傳回的 ActivityReceivedEventArgs。

注意

適用於 C++、JavaScript、Objective-C、Python 和 Swift 的語音 SDK 支援自訂語音助理，但我們尚未在此包含指南。

您可以在 GitHub 上檢視或下載所有語音 SDK C# 範例。

必要條件

開始之前，請務必：

建立語音資源
設定您的開發環境並建立空白專案
建立連線至 Direct Line Speech 通道的 Bot
確定您可以存取麥克風以擷取音訊

注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

在 Visual Studio 中開啟您的專案

第一個步驟是確定您已在 Visual Studio 中開啟專案。

從重複使用程式碼開始著手

我們將新增程式碼，作為專案的基本架構。

在 [方案總管] 中開啟 MainPage.xaml。

在設計工具的 XAML 檢視中，以定義基本使用者介面的下列程式碼片段取代整個內容：

<Page
    x:Class="helloworld.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:helloworld"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d"
    Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">

    <Grid>
        <StackPanel Orientation="Vertical" HorizontalAlignment="Center"  
                    Margin="20,50,0,0" VerticalAlignment="Center" Width="800">
            <Button x:Name="EnableMicrophoneButton" Content="Enable Microphone"  
                    Margin="0,0,10,0" Click="EnableMicrophone_ButtonClicked" 
                    Height="35"/>
            <Button x:Name="ListenButton" Content="Talk to your bot" 
                    Margin="0,10,10,0" Click="ListenButton_ButtonClicked" 
                    Height="35"/>
            <StackPanel x:Name="StatusPanel" Orientation="Vertical" 
                        RelativePanel.AlignBottomWithPanel="True" 
                        RelativePanel.AlignRightWithPanel="True" 
                        RelativePanel.AlignLeftWithPanel="True">
                <TextBlock x:Name="StatusLabel" Margin="0,10,10,0" 
                           TextWrapping="Wrap" Text="Status:" FontSize="20"/>
                <Border x:Name="StatusBorder" Margin="0,0,0,0">
                    <ScrollViewer VerticalScrollMode="Auto"  
                                  VerticalScrollBarVisibility="Auto" MaxHeight="200">
                        <!-- Use LiveSetting to enable screen readers to announce 
                             the status update. -->
                        <TextBlock 
                            x:Name="StatusBlock" FontWeight="Bold" 
                            AutomationProperties.LiveSetting="Assertive"
                            MaxWidth="{Binding ElementName=Splitter, Path=ActualWidth}" 
                            Margin="10,10,10,20" TextWrapping="Wrap"  />
                    </ScrollViewer>
                </Border>
            </StackPanel>
        </StackPanel>
        <MediaElement x:Name="mediaElement"/>
    </Grid>
</Page>

設計檢視更新為顯示應用程式的使用者介面。

在 [方案總管] 中，開啟程式碼後置原始檔 MainPage.xaml.cs。 (它分組在 MainPage.xaml 底下。)將此檔案的內容取代為下列內容，其中包括：

Speech 和 Speech.Dialog 命名空間的 using 陳述式
進行簡單的實作，確認麥克風存取正常並連線至按鈕處理常式
應用程式中的基本 UI 協助程式會顯示訊息及錯誤
初始程式碼路徑登陸點，稍後會填入資訊
協助程式會播放文字轉換語音資訊 (無需串流支援)

稍後會填入即將開始接聽的空白按鈕處理常式

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Dialog;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
using Windows.Foundation;
using Windows.Storage.Streams;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;

namespace helloworld
{
    public sealed partial class MainPage : Page
    {
        private DialogServiceConnector connector;

        private enum NotifyType
        {
            StatusMessage,
            ErrorMessage
        };

        public MainPage()
        {
            this.InitializeComponent();
        }

        private async void EnableMicrophone_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            bool isMicAvailable = true;
            try
            {
                var mediaCapture = new Windows.Media.Capture.MediaCapture();
                var settings = 
                    new Windows.Media.Capture.MediaCaptureInitializationSettings();
                settings.StreamingCaptureMode = 
                    Windows.Media.Capture.StreamingCaptureMode.Audio;
                await mediaCapture.InitializeAsync(settings);
            }
            catch (Exception)
            {
                isMicAvailable = false;
            }
            if (!isMicAvailable)
            {
                await Windows.System.Launcher.LaunchUriAsync(
                    new Uri("ms-settings:privacy-microphone"));
            }
            else
            {
                NotifyUser("Microphone was enabled", NotifyType.StatusMessage);
            }
        }

        private void NotifyUser(
            string strMessage, NotifyType type = NotifyType.StatusMessage)
        {
            // If called from the UI thread, then update immediately.
            // Otherwise, schedule a task on the UI thread to perform the update.
            if (Dispatcher.HasThreadAccess)
            {
                UpdateStatus(strMessage, type);
            }
            else
            {
                var task = Dispatcher.RunAsync(
                    Windows.UI.Core.CoreDispatcherPriority.Normal, 
                    () => UpdateStatus(strMessage, type));
            }
        }

        private void UpdateStatus(string strMessage, NotifyType type)
        {
            switch (type)
            {
                case NotifyType.StatusMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Green);
                    break;
                case NotifyType.ErrorMessage:
                    StatusBorder.Background = new SolidColorBrush(
                        Windows.UI.Colors.Red);
                    break;
            }
            StatusBlock.Text += string.IsNullOrEmpty(StatusBlock.Text) 
                ? strMessage : "\n" + strMessage;

            if (!string.IsNullOrEmpty(StatusBlock.Text))
            {
                StatusBorder.Visibility = Visibility.Visible;
                StatusPanel.Visibility = Visibility.Visible;
            }
            else
            {
                StatusBorder.Visibility = Visibility.Collapsed;
                StatusPanel.Visibility = Visibility.Collapsed;
            }
            // Raise an event if necessary to enable a screen reader 
            // to announce the status update.
            var peer = Windows.UI.Xaml.Automation.Peers.FrameworkElementAutomationPeer.FromElement(StatusBlock);
            if (peer != null)
            {
                peer.RaiseAutomationEvent(
                    Windows.UI.Xaml.Automation.Peers.AutomationEvents.LiveRegionChanged);
            }
        }

        // Waits for and accumulates all audio associated with a given 
        // PullAudioOutputStream and then plays it to the MediaElement. Long spoken 
        // audio will create extra latency and a streaming playback solution 
        // (that plays audio while it continues to be received) should be used -- 
        // see the samples for examples of this.
        private void SynchronouslyPlayActivityAudio(
            PullAudioOutputStream activityAudio)
        {
            var playbackStreamWithHeader = new MemoryStream();
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("RIFF"), 0, 4); // ChunkID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // ChunkSize: max
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("WAVE"), 0, 4); // Format
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("fmt "), 0, 4); // Subchunk1ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 4); // Subchunk1Size: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // AudioFormat: PCM
            playbackStreamWithHeader.Write(BitConverter.GetBytes(1), 0, 2); // NumChannels: mono
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16000), 0, 4); // SampleRate: 16kHz
            playbackStreamWithHeader.Write(BitConverter.GetBytes(32000), 0, 4); // ByteRate
            playbackStreamWithHeader.Write(BitConverter.GetBytes(2), 0, 2); // BlockAlign
            playbackStreamWithHeader.Write(BitConverter.GetBytes(16), 0, 2); // BitsPerSample: 16-bit
            playbackStreamWithHeader.Write(Encoding.ASCII.GetBytes("data"), 0, 4); // Subchunk2ID
            playbackStreamWithHeader.Write(BitConverter.GetBytes(UInt32.MaxValue), 0, 4); // Subchunk2Size

            byte[] pullBuffer = new byte[2056];

            uint lastRead = 0;
            do
            {
                lastRead = activityAudio.Read(pullBuffer);
                playbackStreamWithHeader.Write(pullBuffer, 0, (int)lastRead);
            }
            while (lastRead == pullBuffer.Length);

            var task = Dispatcher.RunAsync(
                Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
            {
                mediaElement.SetSource(
                    playbackStreamWithHeader.AsRandomAccessStream(), "audio/wav");
                mediaElement.Play();
            });
        }

        private void InitializeDialogServiceConnector()
        {
            // New code will go here
        }

        private async void ListenButton_ButtonClicked(
            object sender, RoutedEventArgs e)
        {
            // New code will go here
        }
    }
}

將下列程式碼片段新增至 InitializeDialogServiceConnector 的方法主體。此程式碼會使用您的訂用帳戶資訊建立 DialogServiceConnector。

// Create a BotFrameworkConfig by providing a Speech service subscription key
// the botConfig.Language property is optional (default en-US)
const string speechSubscriptionKey = "YourSpeechSubscriptionKey"; // Your subscription key
const string region = "YourServiceRegion"; // Your subscription service region.

var botConfig = BotFrameworkConfig.FromSubscription(speechSubscriptionKey, region);
botConfig.Language = "en-US";
connector = new DialogServiceConnector(botConfig);

注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

注意

如需設定 Bot 的詳細資訊，請參閱 Direct Line Speech 通道的 Bot Framework 文件。

針對您的語音訂用帳戶和區域，以您自己的值取代字串 YourSpeechSubscriptionKey 和 YourServiceRegion。

將下列程式碼片段附加至 InitializeDialogServiceConnector 的方法主體結尾。此程式碼為依賴 DialogServiceConnector 的事件設定處理常式，以傳達其 Bot 活動、語音辨識結果及其他資訊。

// ActivityReceived is the main way your bot will communicate with the client 
// and uses bot framework activities
connector.ActivityReceived += (sender, activityReceivedEventArgs) =>
{
    NotifyUser(
        $"Activity received, hasAudio={activityReceivedEventArgs.HasAudio} activity={activityReceivedEventArgs.Activity}");

    if (activityReceivedEventArgs.HasAudio)
    {
        SynchronouslyPlayActivityAudio(activityReceivedEventArgs.Audio);
    }
};

// Canceled will be signaled when a turn is aborted or experiences an error condition
connector.Canceled += (sender, canceledEventArgs) =>
{
    NotifyUser($"Canceled, reason={canceledEventArgs.Reason}");
    if (canceledEventArgs.Reason == CancellationReason.Error)
    {
        NotifyUser(
            $"Error: code={canceledEventArgs.ErrorCode}, details={canceledEventArgs.ErrorDetails}");
    }
};

// Recognizing (not 'Recognized') will provide the intermediate recognized text 
// while an audio stream is being processed
connector.Recognizing += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Recognizing! in-progress text={recognitionEventArgs.Result.Text}");
};

// Recognized (not 'Recognizing') will provide the final recognized text 
// once audio capture is completed
connector.Recognized += (sender, recognitionEventArgs) =>
{
    NotifyUser($"Final speech to text result: '{recognitionEventArgs.Result.Text}'");
};

// SessionStarted will notify when audio begins flowing to the service for a turn
connector.SessionStarted += (sender, sessionEventArgs) =>
{
    NotifyUser($"Now Listening! Session started, id={sessionEventArgs.SessionId}");
};

// SessionStopped will notify when a turn is complete and 
// it's safe to begin listening again
connector.SessionStopped += (sender, sessionEventArgs) =>
{
    NotifyUser($"Listening complete. Session ended, id={sessionEventArgs.SessionId}");
};

將下列程式碼片段新增至 MainPage 類別中的 ListenButton_ButtonClicked 方法。此程式碼設定 DialogServiceConnector 來接聽，因為您已經建立組態並註冊事件處理常式。

if (connector == null)
{
    InitializeDialogServiceConnector();
    // Optional step to speed up first interaction: if not called, 
    // connection happens automatically on first use
    var connectTask = connector.ConnectAsync();
}

try
{
    // Start sending audio to your speech-enabled bot
    var listenTask = connector.ListenOnceAsync();

    // You can also send activities to your bot as JSON strings -- 
    // Microsoft.Bot.Schema can simplify this
    string speakActivity = 
        @"{""type"":""message"",""text"":""Greeting Message"", ""speak"":""Hello there!""}";
    await connector.SendActivityAsync(speakActivity);

}
catch (Exception ex)
{
    NotifyUser($"Exception: {ex.ToString()}", NotifyType.ErrorMessage);
}

建置並執行您的應用程式

現在您已準備好使用語音服務來建置應用程式，並測試您的自訂語音助理。

從功能表列中，選擇 [建置]>[建置方案] 來建置應用程式。程式碼現在應該可以編譯，而且不會出現任何錯誤。
選擇 [偵錯]>[開始偵錯] (或按 F5) 以啟動應用程式。 helloworld 視窗會出現。
選取 [啟用麥克風]，當存取權限要求出現時，選取 [是]。
選取 [與您的 Bot 交談]，然後對裝置的麥克風說出英文片語或句子。您的語音會傳送到 Direct Line Speech 頻道並轉譯為文字，該文字會出現在視窗中。

下一步

探索 GitHub 上的 C# 範例 \(英文\)

您可以在 GitHub 上檢視或下載所有語音 SDK Java 範例。

選擇您的目標環境

Java Runtime
Android

必要條件

開始之前，請務必：

建立語音資源
設定您的開發環境並建立空白專案
建立連線至 Direct Line Speech 通道的 Bot
確定您可以存取麥克風以擷取音訊

注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

建立及設定專案

建立 Eclipse 專案並安裝語音 SDK。

此外，若要啟用記錄功能，請更新 pom.xml 檔案以包含下列相依性：

 <dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-simple</artifactId>
     <version>1.7.5</version>
 </dependency>

新增範例程式碼

若要將新的空白類別新增到您的 Java 專案，請選取 [檔案]>[新增]>[類別]。
在 [新增 Java 類別] 視窗中，於 [套件] 欄位中輸入 speechsdk.quickstart，並在 [名稱] 欄位中輸入 Main。

開啟新建立的 Main 類別，並以下列起始程式碼取代 Main.java 檔案的內容：

package speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
import java.io.InputStream;

public class Main {
    final Logger log = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        // New code will go here
    }

    private void playAudioStream(PullAudioOutputStream audio) {
        ActivityAudioStream stream = new ActivityAudioStream(audio);
        final ActivityAudioStream.ActivityAudioFormat audioFormat = stream.getActivityAudioFormat();
        final AudioFormat format = new AudioFormat(
                AudioFormat.Encoding.PCM_SIGNED,
                audioFormat.getSamplesPerSecond(),
                audioFormat.getBitsPerSample(),
                audioFormat.getChannels(),
                audioFormat.getFrameSize(),
                audioFormat.getSamplesPerSecond(),
                false);
        try {
            int bufferSize = format.getFrameSize();
            final byte[] data = new byte[bufferSize];

            SourceDataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
            SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(format);

            if (line != null) {
                line.start();
                int nBytesRead = 0;
                while (nBytesRead != -1) {
                    nBytesRead = stream.read(data);
                    if (nBytesRead != -1) {
                        line.write(data, 0, nBytesRead);
                    }
                }
                line.drain();
                line.stop();
                line.close();
            }
            stream.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

在 main 方法中，您會先設定 DialogServiceConfig 並用其建立 DialogServiceConnector 執行個體。此執行個體會連線至 Direct Line Speech 頻道來與您的 Bot 互動。 AudioConfig 執行個體也用來指定音訊輸入來源。在此範例中，預設麥克風會搭配 AudioConfig.fromDefaultMicrophoneInput() 使用。
- 將字串 YourSubscriptionKey 取代為您的語音資源金鑰，您可從 Azure 入口網站取得該金鑰。
- 將字串 YourServiceRegion 取代為與您的語音資源相關聯的區域。
注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。
```
final String subscriptionKey = "YourSubscriptionKey"; // Your subscription key
final String region = "YourServiceRegion"; // Your speech subscription service region
final BotFrameworkConfig botConfig = BotFrameworkConfig.fromSubscription(subscriptionKey, region);

// Configure audio input from a microphone.
final AudioConfig audioConfig = AudioConfig.fromDefaultMicrophoneInput();

// Create a DialogServiceConnector instance.
final DialogServiceConnector connector = new DialogServiceConnector(botConfig, audioConfig);
```

DialogServiceConnector 連接器依賴多個事件來傳達其 Bot 活動、語音辨識結果及其他資訊。接下來新增這些事件接聽程式。

// Recognizing will provide the intermediate recognized text while an audio stream is being processed.
connector.recognizing.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognizing speech event text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// Recognized will provide the final recognized text once audio capture is completed.
connector.recognized.addEventListener((o, speechRecognitionResultEventArgs) -> {
    log.info("Recognized speech event reason text: {}", speechRecognitionResultEventArgs.getResult().getText());
});

// SessionStarted will notify when audio begins flowing to the service for a turn.
connector.sessionStarted.addEventListener((o, sessionEventArgs) -> {
    log.info("Session Started event id: {} ", sessionEventArgs.getSessionId());
});

// SessionStopped will notify when a turn is complete and it's safe to begin listening again.
connector.sessionStopped.addEventListener((o, sessionEventArgs) -> {
    log.info("Session stopped event id: {}", sessionEventArgs.getSessionId());
});

// Canceled will be signaled when a turn is aborted or experiences an error condition.
connector.canceled.addEventListener((o, canceledEventArgs) -> {
    log.info("Canceled event details: {}", canceledEventArgs.getErrorDetails());
    connector.disconnectAsync();
});

// ActivityReceived is the main way your bot will communicate with the client and uses Bot Framework activities.
connector.activityReceived.addEventListener((o, activityEventArgs) -> {
    final String act = activityEventArgs.getActivity().serialize();
        log.info("Received activity {} audio", activityEventArgs.hasAudio() ? "with" : "without");
        if (activityEventArgs.hasAudio()) {
            playAudioStream(activityEventArgs.getAudio());
        }
    });

叫用 connectAsync() 方法，以將 DialogServiceConnector 連線到 Direct Line Speech。若要測試您的 Bot，您可以叫用 listenOnceAsync 方法，以傳送來自麥克風的音訊輸入。此外，您也可以使用 sendActivityAsync 方法，以序列化字串的形式傳送自訂活動。這些自訂活動可以提供您 Bot 在交談中使用的其他資料。
```
connector.connectAsync();
// Start listening.
System.out.println("Say something ...");
connector.listenOnceAsync();

// connector.sendActivityAsync(...)
```
將變更儲存到 Main 檔案。
若要支援回應播放，請在 Java inputStream 中新增額外的類別以轉換從 getAudio() API 傳回的 PullAudioOutputStream 物件，以便輕鬆處理。此 ActivityAudioStream 是特殊的類別，可處理來自 Direct Line Speech 頻道的語音回應。也會提供處理播放所需的存取子，以擷取音訊格式資訊。如需此功能，請選取 [檔案]> [新建]> [類別]。
在 [新建 Java 類別] 視窗中，將 speechsdk.quickstart 輸入 [套件] 欄位，並將 ActivityAudioStream 輸入 [名稱] 欄位。

開啟新建的 ActivityAudioStream 類別，並將內容取代為下列程式碼：

package com.speechsdk.quickstart;

import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;

import java.io.IOException;
import java.io.InputStream;

 public final class ActivityAudioStream extends InputStream {
     /**
      * The number of samples played per second (16 kHz).
      */
     public static final long SAMPLE_RATE = 16000;
     /**
      * The number of bits in each sample of a sound that has this format (16 bits).
      */
     public static final int BITS_PER_SECOND = 16;
     /**
      * The number of audio channels in this format (1 for mono).
      */
     public static final int CHANNELS = 1;
     /**
      * The number of bytes in each frame of a sound that has this format (2).
      */
     public static final int FRAME_SIZE = 2;

     /**
      * Reads up to a specified maximum number of bytes of data from the audio
      * stream, putting them into the given byte array.
      *
      * @param b   the buffer into which the data is read
      * @param off the offset, from the beginning of array <code>b</code>, at which
      *            the data will be written
      * @param len the maximum number of bytes to read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b, int off, int len) {
         byte[] tempBuffer = new byte[len];
         int n = (int) this.pullStreamImpl.read(tempBuffer);
         for (int i = 0; i < n; i++) {
             if (off + i > b.length) {
                 throw new ArrayIndexOutOfBoundsException(b.length);
             }
             b[off + i] = tempBuffer[i];
         }
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Reads the next byte of data from the activity audio stream if available.
      *
      * @return the next byte of data, or -1 if the end of the stream is reached
      * @see #read(byte[], int, int)
      * @see #read(byte[])
      * @see #available
      * <p>
      */
     @Override
     public int read() {
         byte[] data = new byte[1];
         int temp = read(data);
         if (temp <= 0) {
             // we have a weird situation if read(byte[]) returns 0!
             return -1;
         }
         return data[0] & 0xFF;
     }

     /**
      * Reads up to a specified maximum number of bytes of data from the activity audio stream,
      * putting them into the given byte array.
      *
      * @param b the buffer into which the data is read
      * @return the total number of bytes read into the buffer, or -1 if there
      * is no more data because the end of the stream has been reached
      */
     @Override
     public int read(byte[] b) {
         int n = (int) pullStreamImpl.read(b);
         if (n == 0) {
             return -1;
         }
         return n;
     }

     /**
      * Skips over and discards a specified number of bytes from this
      * audio input stream.
      *
      * @param n the requested number of bytes to be skipped
      * @return the actual number of bytes skipped
      * @throws IOException if an input or output error occurs
      * @see #read
      * @see #available
      */
     @Override
     public long skip(long n) {
         if (n <= 0) {
             return 0;
         }
         if (n <= Integer.MAX_VALUE) {
             byte[] tempBuffer = new byte[(int) n];
             return read(tempBuffer);
         }
         long count = 0;
         for (long i = n; i > 0; i -= Integer.MAX_VALUE) {
             int size = (int) Math.min(Integer.MAX_VALUE, i);
             byte[] tempBuffer = new byte[size];
             count += read(tempBuffer);
         }
         return count;
     }

     /**
      * Closes this audio input stream and releases any system resources associated
      * with the stream.
      */
     @Override
     public void close() {
         this.pullStreamImpl.close();
     }

     /**
      * Fetch the audio format for the ActivityAudioStream. The ActivityAudioFormat defines the sample rate, bits per sample, and the # channels.
      *
      * @return instance of the ActivityAudioFormat associated with the stream
      */
     public ActivityAudioStream.ActivityAudioFormat getActivityAudioFormat() {
         return activityAudioFormat;
     }

     /**
      * Returns the maximum number of bytes that can be read (or skipped over) from this
      * audio input stream without blocking.
      *
      * @return the number of bytes that can be read from this audio input stream without blocking.
      * As this implementation does not buffer, this will be defaulted to 0
      */
     @Override
     public int available() {
         return 0;
     }

     public ActivityAudioStream(final PullAudioOutputStream stream) {
         pullStreamImpl = stream;
         this.activityAudioFormat = new ActivityAudioStream.ActivityAudioFormat(SAMPLE_RATE, BITS_PER_SECOND, CHANNELS, FRAME_SIZE, AudioEncoding.PCM_SIGNED);
     }

     private PullAudioOutputStream pullStreamImpl;

     private ActivityAudioFormat activityAudioFormat;

     /**
      * ActivityAudioFormat is an internal format which contains metadata regarding the type of arrangement of
      * audio bits in this activity audio stream.
      */
     static class ActivityAudioFormat {

         private long samplesPerSecond;
         private int bitsPerSample;
         private int channels;
         private int frameSize;
         private AudioEncoding encoding;

         public ActivityAudioFormat(long samplesPerSecond, int bitsPerSample, int channels, int frameSize, AudioEncoding encoding) {
             this.samplesPerSecond = samplesPerSecond;
             this.bitsPerSample = bitsPerSample;
             this.channels = channels;
             this.encoding = encoding;
             this.frameSize = frameSize;
         }

         /**
          * Fetch the number of samples played per second for the associated audio stream format.
          *
          * @return the number of samples played per second
          */
         public long getSamplesPerSecond() {
             return samplesPerSecond;
         }

         /**
          * Fetch the number of bits in each sample of a sound that has this audio stream format.
          *
          * @return the number of bits per sample
          */
         public int getBitsPerSample() {
             return bitsPerSample;
         }

         /**
          * Fetch the number of audio channels used by this audio stream format.
          *
          * @return the number of channels
          */
         public int getChannels() {
             return channels;
         }

         /**
          * Fetch the default number of bytes in a frame required by this audio stream format.
          *
          * @return the number of bytes
          */
         public int getFrameSize() {
             return frameSize;
         }

         /**
          * Fetch the audio encoding type associated with this audio stream format.
          *
          * @return the encoding associated
          */
         public AudioEncoding getEncoding() {
             return encoding;
         }
     }

     /**
      * Enum defining the types of audio encoding supported by this stream.
      */
     public enum AudioEncoding {
         PCM_SIGNED("PCM_SIGNED");

         String value;

         AudioEncoding(String value) {
             this.value = value;
         }
     }
 }

將變更儲存到 ActivityAudioStream 檔案。

建置並執行應用程式

選取 F11 鍵，或選取 [執行]>[偵錯]。主控台會顯示訊息：「請開始說話」。此時，請說出您 Bot 可了解的英文片語或句子。您的語音會透過您 Bot 可辨識及處理的 Direct Line 語音頻道來傳輸至您的 Bot。回應會以活動形式傳回。如果您的 Bot 傳回語音作為回應，則會使用 AudioPlayer 類別播放音訊。

成功辨識後主控台輸出的螢幕擷取畫面

下一步

探索 GitHub 上的 Java 範例 \(英文\)

必要條件

開始之前，請務必：

建立語音資源
設定您的開發環境並建立空白專案
建立連線至 Direct Line Speech 通道的 Bot
確定您可以存取麥克風以擷取音訊

注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

建立和設定專案

使用 Android Studio 安裝語音 SDK。

建立使用者介面

在本節中，我們將建立應用程式的基本使用者介面 (UI)。首先請開啟主要活動：activity_main.xml。基本範本包含具有應用程式名稱的標題列，和含有訊息 "Hello world!" 的 TextView。

接著，請將 activity_main.xml 的內容取代為下列程式碼：

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
 xmlns:tools="http://schemas.android.com/tools"
 android:layout_width="match_parent"
 android:layout_height="match_parent"
 android:orientation="vertical"
 tools:context=".MainActivity">

 <Button
     android:id="@+id/button"
     android:layout_width="wrap_content"
     android:layout_height="wrap_content"
     android:layout_gravity="center"
     android:onClick="onBotButtonClicked"
     android:text="Talk to your bot" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Recognition Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/recoText"
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="  \n(Recognition goes here)\n" />

 <TextView
     android:layout_width="match_parent"
     android:layout_height="wrap_content"
     android:text="Activity Data"
     android:textSize="18dp"
     android:textStyle="bold" />

 <TextView
     android:id="@+id/activityText"
     android:layout_width="match_parent"
     android:layout_height="match_parent"
     android:scrollbars="vertical"
     android:text="  \n(Activities go here)\n" />

</LinearLayout>

此 XML 會定義一個與您的 Bot 互動的簡單 UI。

button 元素經點按後即會起始互動並叫用 onBotButtonClicked 方法。
recoText 元素會在您與 Bot 交談時顯示語音轉文字的結果。
activityText 元素會針對來自 Bot 的最新 Bot Framework 活動顯示 JSON 承載。

UI 的文字表現方式和畫面現在應會如下：

與 Bot UI 對話時的螢幕擷取畫面。

新增範例程式碼

開啟 MainActivity.java，然後以下列程式碼取代內容：

 package samples.speech.cognitiveservices.microsoft.com;

 import android.media.AudioFormat;
 import android.media.AudioManager;
 import android.media.AudioTrack;
 import android.support.v4.app.ActivityCompat;
 import android.support.v7.app.AppCompatActivity;
 import android.os.Bundle;
 import android.text.method.ScrollingMovementMethod;
 import android.view.View;
 import android.widget.TextView;

 import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
 import com.microsoft.cognitiveservices.speech.audio.PullAudioOutputStream;
 import com.microsoft.cognitiveservices.speech.dialog.BotFrameworkConfig;
 import com.microsoft.cognitiveservices.speech.dialog.DialogServiceConnector;

 import org.json.JSONException;
 import org.json.JSONObject;

 import static android.Manifest.permission.*;

 public class MainActivity extends AppCompatActivity {
     // Replace below with your own speech subscription key
     private static String speechSubscriptionKey = "YourSpeechSubscriptionKey";
     // Replace below with your own speech service region
     private static String serviceRegion = "YourSpeechServiceRegion";

     private DialogServiceConnector connector;

     @Override
     protected void onCreate(Bundle savedInstanceState) {
         super.onCreate(savedInstanceState);
         setContentView(R.layout.activity_main);

         TextView recoText = (TextView) this.findViewById(R.id.recoText);
         TextView activityText = (TextView) this.findViewById(R.id.activityText);
         recoText.setMovementMethod(new ScrollingMovementMethod());
         activityText.setMovementMethod(new ScrollingMovementMethod());

         // Note: we need to request permissions for audio input and network access
         int requestCode = 5; // unique code for the permission request
         ActivityCompat.requestPermissions(MainActivity.this, new String[]{RECORD_AUDIO, INTERNET}, requestCode);
     }

     public void onBotButtonClicked(View v) {
         // Recreate the DialogServiceConnector on each button press, ensuring that the existing one is closed
         if (connector != null) {
             connector.close();
             connector = null;
         }

         // Create the DialogServiceConnector from speech subscription information
         BotFrameworkConfig config = BotFrameworkConfig.fromSubscription(speechSubscriptionKey, serviceRegion);
         connector = new DialogServiceConnector(config, AudioConfig.fromDefaultMicrophoneInput());

         // Optional step: preemptively connect to reduce first interaction latency
         connector.connectAsync();

         // Register the DialogServiceConnector's event listeners
         registerEventListeners();

         // Begin sending audio to your bot
         connector.listenOnceAsync();
     }

     private void registerEventListeners() {
         TextView recoText = (TextView) this.findViewById(R.id.recoText); // 'recoText' is the ID of your text view
         TextView activityText = (TextView) this.findViewById(R.id.activityText); // 'activityText' is the ID of your text view

         // Recognizing will provide the intermediate recognized text while an audio stream is being processed
         connector.recognizing.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognizing: " + recoArgs.getResult().getText());
         });

         // Recognized will provide the final recognized text once audio capture is completed
         connector.recognized.addEventListener((o, recoArgs) -> {
             recoText.setText("  Recognized: " + recoArgs.getResult().getText());
         });

         // SessionStarted will notify when audio begins flowing to the service for a turn
         connector.sessionStarted.addEventListener((o, sessionArgs) -> {
             recoText.setText("Listening...");
         });

         // SessionStopped will notify when a turn is complete and it's safe to begin listening again
         connector.sessionStopped.addEventListener((o, sessionArgs) -> {
         });

         // Canceled will be signaled when a turn is aborted or experiences an error condition
         connector.canceled.addEventListener((o, canceledArgs) -> {
             recoText.setText("Canceled (" + canceledArgs.getReason().toString() + ") error details: {}" + canceledArgs.getErrorDetails());
             connector.disconnectAsync();
         });

         // ActivityReceived is the main way your bot will communicate with the client and uses bot framework activities.
         connector.activityReceived.addEventListener((o, activityArgs) -> {
             try {
                 // Here we use JSONObject only to "pretty print" the condensed Activity JSON
                 String rawActivity = activityArgs.getActivity().serialize();
                 String formattedActivity = new JSONObject(rawActivity).toString(2);
                 activityText.setText(formattedActivity);
             } catch (JSONException e) {
                 activityText.setText("Couldn't format activity text: " + e.getMessage());
             }

             if (activityArgs.hasAudio()) {
                 // Text to speech audio associated with the activity is 16 kHz 16-bit mono PCM data
                 final int sampleRate = 16000;
                 int bufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

                 AudioTrack track = new AudioTrack(
                         AudioManager.STREAM_MUSIC,
                         sampleRate,
                         AudioFormat.CHANNEL_OUT_MONO,
                         AudioFormat.ENCODING_PCM_16BIT,
                         bufferSize,
                         AudioTrack.MODE_STREAM);

                 track.play();

                 PullAudioOutputStream stream = activityArgs.getAudio();

                 // Audio is streamed as it becomes available. Play it as it arrives.
                 byte[] buffer = new byte[bufferSize];
                 long bytesRead = 0;

                 do {
                     bytesRead = stream.read(buffer);
                     track.write(buffer, 0, (int) bytesRead);
                 } while (bytesRead == bufferSize);

                 track.release();
             }
         });
     }
 }

onCreate 方法包含要求麥克風和網際網路權限的程式碼。
如先前所述，方法 onBotButtonClicked 是按鈕點擊處理常式。按下按鈕就會觸發您與 Bot 之間的單一互動 (「回合」)。
registerEventListeners 方法會示範 DialogServiceConnector 所使用的事件和傳入活動的基本處理方式。

在相同的檔案中，將組態字串取代為與您的資源相符的字串：
- 將 YourSpeechSubscriptionKey 取代為訂用帳戶金鑰。
- 將 YourServiceRegion 取代為與您的訂用帳戶相關聯的區域。目前只有部分語音服務區域支援 Direct Line Speech。如需詳細資訊，請參閱區域。

建置並執行應用程式

將 Android 裝置連接到開發電腦。請確定您已在裝置上啟用開發模式和 USB 偵錯。
若要建置應用程式，請按 Ctrl+F9，或從功能表列中選擇 [建置]>[建立專案]。
若要啟動應用程式，請按 Shift+F10，或選擇 [執行]>[執行應用程式]。
在出現的 [部署目標] 視窗中，選擇您的 Android 裝置。

在應用程式及其活動啟動後，按一下按鈕開始與 Bot 交談。當您說話時，將會顯示轉譯的文字，且如果您接收到來自 Bot 的最新活動，這些活動將會在接收時顯示。如果您的 Bot 會設定為提供語音回應，就會自動播放語音轉文字。

Android 應用程式的螢幕擷取畫面

下一步

探索 GitHub 上的 Java 範例 \(英文\)

您可以在 GitHub 上檢視或下載所有語音 SDK Go 範例。

必要條件

開始之前：

建立語音資源
設定您的開發環境並建立空白專案
建立連線至 Direct Line Speech 通道的 Bot
確定您可以存取麥克風以擷取音訊

注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

設定環境

藉由新增以下這一行，以最新的 SDK 版本更新 go.mod 檔案

require (
    github.com/Microsoft/cognitive-services-speech-sdk-go v1.15.0
)

從重複使用程式碼開始著手

將來源檔案 (例如 quickstart.go) 的內容取代為下列內容，其中包括：

「主要」套件定義
從語音 SDK 匯入必要的模組
用來儲存 Bot 資訊的變數，稍後將在本快速入門中加以取代
使用音訊輸入麥克風的簡單實作
在語音互動期間發生的各種事件適用的事件處理常式

package main

import (
    "fmt"
    "time"

    "github.com/Microsoft/cognitive-services-speech-sdk-go/audio"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/dialog"
    "github.com/Microsoft/cognitive-services-speech-sdk-go/speech"
)

func main() {
    subscription :=  "YOUR_SUBSCRIPTION_KEY"
    region := "YOUR_BOT_REGION"

    audioConfig, err := audio.NewAudioConfigFromDefaultMicrophoneInput()
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer audioConfig.Close()
    config, err := dialog.NewBotFrameworkConfigFromSubscription(subscription, region)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer config.Close()
    connector, err := dialog.NewDialogServiceConnectorFromConfig(config, audioConfig)
    if err != nil {
        fmt.Println("Got an error: ", err)
        return
    }
    defer connector.Close()
    activityReceivedHandler := func(event dialog.ActivityReceivedEventArgs) {
        defer event.Close()
        fmt.Println("Received an activity.")
    }
    connector.ActivityReceived(activityReceivedHandler)
    recognizedHandle := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognized ", event.Result.Text)
    }
    connector.Recognized(recognizedHandle)
    recognizingHandler := func(event speech.SpeechRecognitionEventArgs) {
        defer event.Close()
        fmt.Println("Recognizing ", event.Result.Text)
    }
    connector.Recognizing(recognizingHandler)
    connector.ListenOnceAsync()
    <-time.After(10 * time.Second)
}

以語音資源中的實際值取代 YOUR_SUBSCRIPTION_KEY 和 YOUR_BOT_REGION 值。

瀏覽至 Azure 入口網站，然後開啟語音資源
在左側的 [金鑰和端點] 底下，有兩個可用的訂用帳戶金鑰
- 使用其中一個作為 YOUR_SUBSCRIPTION_KEY 值取代
在左側的 [概觀] 底下記下區域，並將其對應至區域識別碼
- 使用區域識別碼來取代 YOUR_BOT_REGION 值，例如："westus" 代表 [美國西部]
注意

請參閱語音助理支援的區域清單，並確定您的資源已部署於其中一個區域。

注意

如需設定 Bot 的詳細資訊，請參閱 Direct Line Speech 通道的 Bot Framework 文件。

程式碼說明

需要語音訂用帳戶金鑰和區域，才能建立語音設定物件。需要設定物件，才能具現化語音辨識器物件。

辨識器執行個體會公開多種辨識語音的方式。在此範例中，會持續辨識語音。此功能可讓語音服務知道您要傳送多個片語以進行辨識，以及程式何時會終止而停止辨識語音。產生結果後，程式碼會將其寫入至主控台。

建置和執行

現在您已完成設定，可使用語音服務建置專案及測試自訂語音助理。

建置您的專案，例如 "go build"
執行模組，並對裝置的麥克風說出片語或句子。您的語音會傳送到 Direct Line Speech 通道並轉譯為文字，出現在輸出中。

注意

語音 SDK 預設為使用 en-us 當作語言來辨識，如需有關選擇來源語言的詳細資訊，請參閱如何辨識語音。

下一步

探索 GitHub 上的 Go 範例

其他語言和平台支援

按下此索引標籤，表示您可能未找到慣用程式設計語言的快速入門。別擔心，我們在 GitHub 上有其他的快速入門資料和程式碼範例可供使用。您可以使用資料表來尋找您程式設計語言和平台/作業系統組合的合適範例。

語言	程式碼範例
C#	.NET Framework、.NET Core、UWP、Unity
C++	Windows、Linux、macOS
Java	Android、JRE
JavaScript	瀏覽器、Node.js
Objective-C	iOS、macOS
Python	Windows、Linux、macOS
Swift	iOS、macOS

共用方式為

快速入門：建立自訂語音助理

必要條件

在 Visual Studio 中開啟您的專案

從重複使用程式碼開始著手

建置並執行您的應用程式

下一步

必要條件

建立及設定專案

新增範例程式碼

建置並執行應用程式

下一步

必要條件

建立和設定專案

建立使用者介面

新增範例程式碼

建置並執行應用程式

下一步

必要條件

設定環境

從重複使用程式碼開始著手

程式碼說明

建置和執行

下一步

其他語言和平台支援

意見反應

其他資源