练习 - 创建连续识别语音转文本应用程序
在本练习中,你将创建一个应用程序,该应用程序使用连续识别来转录你在上一练习中下载的示例音频文件。
修改文本转语音应用程序的代码
在右侧的 Cloud Shell 中,打开 Program.cs 文件。
code Program.cs
使用以下代码更新 try/catch 块,以将应用程序修改为使用连续识别(而不是单次识别):
try { FileInfo fileInfo = new FileInfo(waveFile); if (fileInfo.Exists) { var speechConfig = SpeechConfig.FromSubscription(azureKey, azureLocation); using var audioConfig = AudioConfig.FromWavFileInput(fileInfo.FullName); using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig); var stopRecognition = new TaskCompletionSource<int>(); FileStream fileStream = File.OpenWrite(textFile); StreamWriter streamWriter = new StreamWriter(fileStream, Encoding.UTF8); speechRecognizer.Recognized += (s, e) => { switch(e.Result.Reason) { case ResultReason.RecognizedSpeech: streamWriter.WriteLine(e.Result.Text); break; case ResultReason.NoMatch: Console.WriteLine("Speech could not be recognized."); break; } }; speechRecognizer.Canceled += (s, e) => { if (e.Reason != CancellationReason.EndOfStream) { Console.WriteLine("Speech recognition canceled."); } stopRecognition.TrySetResult(0); streamWriter.Close(); }; speechRecognizer.SessionStopped += (s, e) => { Console.WriteLine("Speech recognition stopped."); stopRecognition.TrySetResult(0); streamWriter.Close(); }; Console.WriteLine("Speech recognition started."); await speechRecognizer.StartContinuousRecognitionAsync(); Task.WaitAny(new[] { stopRecognition.Task }); await speechRecognizer.StopContinuousRecognitionAsync(); } } catch (Exception ex) { Console.WriteLine(ex.Message); }
完成代码修改后,文件应类似于以下示例:
using System.Text; using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; string azureKey = "ENTER YOUR KEY FROM THE FIRST EXERCISE"; string azureLocation = "ENTER YOUR LOCATION FROM THE FIRST EXERCISE"; string textFile = "Shakespeare.txt"; string waveFile = "Shakespeare.wav"; try { FileInfo fileInfo = new FileInfo(waveFile); if (fileInfo.Exists) { var speechConfig = SpeechConfig.FromSubscription(azureKey, azureLocation); using var audioConfig = AudioConfig.FromWavFileInput(fileInfo.FullName); using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig); var stopRecognition = new TaskCompletionSource<int>(); FileStream fileStream = File.OpenWrite(textFile); StreamWriter streamWriter = new StreamWriter(fileStream, Encoding.UTF8); speechRecognizer.Recognized += (s, e) => { switch(e.Result.Reason) { case ResultReason.RecognizedSpeech: streamWriter.WriteLine(e.Result.Text); break; case ResultReason.NoMatch: Console.WriteLine("Speech could not be recognized."); break; } }; speechRecognizer.Canceled += (s, e) => { if (e.Reason != CancellationReason.EndOfStream) { Console.WriteLine("Speech recognition canceled."); } stopRecognition.TrySetResult(0); streamWriter.Close(); }; speechRecognizer.SessionStopped += (s, e) => { Console.WriteLine("Speech recognition stopped."); stopRecognition.TrySetResult(0); streamWriter.Close(); }; Console.WriteLine("Speech recognition started."); await speechRecognizer.StartContinuousRecognitionAsync(); Task.WaitAny(new[] { stopRecognition.Task }); await speechRecognizer.StopContinuousRecognitionAsync(); } } catch (Exception ex) { Console.WriteLine(ex.Message); }
与上一个练习一样,请确保使用第一个练习中的密钥和位置更新
azureKey
和azureLocation
变量的值。要保存更改,请按 Ctrl+S 保存文件,然后按 Ctrl+Q 退出编辑器。
运行应用程序
若要运行应用程序,请在右侧的 Cloud Shell 中使用以下命令:
dotnet run
如果未看到任何错误,则表示应用程序已成功运行,并且应看到显示以下响应:
Speech recognition started. Speech recognition stopped.
运行以下命令以获取目录中的文件列表:
ls -l
应会看到类似以下示例的响应,并会在文件列表中看到 Shakespeare.txt:
drwxr-xr-x 3 user user 4096 Oct 1 11:11 bin drwxr-xr-x 3 user user 4096 Oct 1 11:11 obj -rw-r--r-- 1 user user 2926 Oct 1 11:11 Program.cs -rw-r--r-- 1 user user 412 Oct 1 11:11 Shakespeare.txt -rwxr-xr-x 1 user user 978242 Oct 1 11:11 Shakespeare.wav -rw-r--r-- 1 user user 348 Oct 1 11:11 speech to text.csproj
你会注意到文本文件的大小大于上一练习的结果。 文件大小的这种差异是因为连续语音识别转换了更多的音频文件。
若要查看 Shakespeare.txt 文件的内容,请使用以下命令:
cat Shakespeare.txt
应会看到一个响应,如下例所示:
The following quotes are from Act 2, scene seven of William Shakespeare's play as you like it. Though CS we are not all alone unhappy. This wide and universal theater presents more woeful pageants than the scene wherein we play in. All the world's a stage and all the men and women merely players. They have their exits and their entrances, and one man in his time plays many parts, his act being seven ages.
如果收听了示例 WAVE 文件,你会注意到此文本现在包含整个音频。 由于我们使用了
SpeechRecognizer
的StartContinuousRecognitionAsync()
方法,因此即使说话人暂停,语音转文本识别也会继续。
改进应用程序的识别结果
在上一部分中,你会注意到第二行文本的结果并不完美;这种识别错误是由于威廉·莎士比亚戏剧中的旧英语词汇。 此示例类似于医疗客户在其笔记和听写中将使用的专业词汇。
使用 Azure AI 语音时,可以通过指定语音识别引擎可能不熟悉的短语列表来帮助改进识别结果。
若要查看此类改进的实际应用示例,请使用以下步骤。
在右侧的 Cloud Shell 中,打开 Program.cs 文件:
code Program.cs
找到以下两行代码:
FileStream fileStream = File.OpenWrite(textFile); StreamWriter streamWriter = new StreamWriter(fileStream, Encoding.UTF8);
直接在这两行之后添加以下代码行。 确保这两行与前面行的缩进匹配:
var phraseList = PhraseListGrammar.FromRecognizer(speechRecognizer); phraseList.AddPhrase("thou seest");
这些行将使语音识别引擎能够检测莎士比亚戏剧中的旧英语短语。
要保存更改,请按 Ctrl+S 保存文件,然后按 Ctrl+Q 退出编辑器。
通过使用以下命令来重新运行应用程序:
dotnet run
应用程序完成后,使用以下命令查看 Shakespeare.txt 文件的内容:
cat Shakespeare.txt
应会看到一个响应,如下例所示:
The following quotes are from Act 2, scene seven of William Shakespeare's play as you like it. Thou seest, we are not all alone unhappy. This wide and universal theater presents more woeful pageants than the scene wherein we play in. All the world's a stage and all the men and women merely players. They have their exits and their entrances, and one man in his time plays many parts, his act being seven ages.
你会注意到,识别错误已在结果中修复。