New Bing Speech Recognition Control and Updated Bing OCR and Translator Controls on Windows Azure Marketplace
At the BUILD conference in June, we announced three broad categories of capabilities the new Bing platform would deliver to developers: Services to bring entities and the world’s knowledge to your applications, services to enable your applications to deliver more natural and intuitive user experiences, and services which bring an awareness of the physical world into your applications. Earlier this month we updated the Bing Maps SDKs for Windows Store Apps for Windows 8 and 8.1. Building on this momentum, today we are announcing the release of the new Bing Speech Recognition Control for Windows 8 and 8.1, and updates to the Bing Optical Character Recognition Control for Windows 8.1 and Bing Translator Control for Windows 8.1 to continue to deliver on our effort to support developers to enable more knowledgeable, natural, and aware applications.
Read on for more details on the updates we’re announcing today, and then check out the Bing developer center for other useful resources, including code samples, for building smarter, more useful applications.
Hands free experiences – Speech Recognition for Windows 8.0 and 8.1
Whether for accessibility, safety, or simple convenience, being able to use your voice to interact hands-free with your device is increasingly important. By enabling devices to recognize speech, users can interact more naturally with their devices to dictate emails, search for the latest news, navigate their apps, and more. If you are a Windows Phone developer, you may already be familiar with the speech recognition inside Windows Phone: the user taps a microphone icon, speaks into the mic, and the text shows up on screen. Now, that same functionality is available on Windows 8, Windows 8.1, and Windows RT through the free Bing Speech Recognition Control.
In as little as ten lines of C# + XAML or JavaScript + HTML, you can put a SpeechRecognizerUX control in your application, along with a microphone button and a TextBlock, and the code to support them. When the user clicks or taps the mic, they will hear a blip, or "earcon", to signal that it's time to speak, and an audio meter will show their current volume level. While speaking, the words detected will be shown in the control. When they stop speaking, or hit the Stop button on the speech control, they will get a brief animation and then their words will appear in the TextBlock.
Here's the XAML to create the UI elements:
<AppBarButton x:Name="SpeakButton" Icon="Microphone" Click="SpeakButton_Click"></AppBarButton>
<TextBlock x:Name="TextResults" />
<sp:SpeechRecognizerUx x:Name="SpeechControl" />
Or in HTML:
<div id="SpeakButton" onclick="speakButton_Click();"></div>
<div id="ResultText"></div>
<div id="SpeechControl" data-win-control="BingWinJS.SpeechRecognizerUx">
In the code behind, all you have to do is create a SpeechRecognizer object, bind it to the SpeechRecognizerUX control, and create a click handler for the microphone button to start speech recognition and write the results to the TextBlock.
Here's the code in C#:
SpeechRecognizer SR; void MainPage_Loaded(object sender, RoutedEventArgs e) { // Apply credentials from the Windows Azure Data Marketplace. var credentials = new SpeechAuthorizationParameters(); credentials.ClientId = "<YOUR CLIENT ID>"; credentials.ClientSecret = "<YOUR CLIENT SECRET>"; // Initialize the speech recognizer. SR = new SpeechRecognizer("en-US", credentials); // Bind it to the VoiceUiControl. SpeechControl.SpeechRecognizer = SR; } private async void SpeakButton_Click(object sender, RoutedEventArgs e) { var result = await SR.RecognizeSpeechToTextAsync(); TextResults.Text = result.Text; }
And here it is in JavaScript:
var SR;function pageLoaded() { // Apply credentials from the Windows Azure Data Marketplace. var credentials = new Bing.Speech.SpeechAuthorizationParameters(); credentials.clientId = "<YOUR CLIENT ID>"; credentials.clientSecret = "<YOUR CLIENT SECRET>"; // Initialize the speech recognizer. SR = new Bing.Speech.SpeechRecognizer("en-US", credentials); // Bind it to the VoiceUiControl. document.getElementById('SpeechControl').winControl.speechRecognizer = SR;
} function speakButton_Click() { SR.recognizeSpeechToTextAsync() .then( function (result) { document.getElementById('ResultText').innerHTML = result.text; })}
To get this code to work, you will have to put your own values in credentials: your ClientId and ClientSecret. You get these from the Windows Azure Data Marketplace when you sign up for your free subscription to the control. The control depends on a web service to function, and you can send up to 500,000 queries per month at the free level. A more detailed version of this code is available in the SpeechRecognizerUX class description on MSDN.
The next capability you would probably want to add is to show alternate results to recognized speech. While the user is speaking, the SpeechRecognizer makes multiple hypotheses about what is being said based on the sounds received so far, and may make multiple hypotheses at the end as well. These alternate guesses are available as a list through the SpeechRecognitionResult.GetAlternates(int)
method, or individually as they are created through the SpeechRecognizer.RecognizerResultRecieved event.
You can also bypass the SpeechRecognizerUX control and create your own custom speech recognition UI, using the SpeechRecognizer.AudioCaptureStateChanged event to trigger the different UI states for startup, listening, processing, and complete. This process is described here, and there is a complete code example in the SpeechRecognizer MSDN page. For a detailed explanation and sample of working with the alternates list, see handling speech recognition data.
Like all of the Bing controls and APIs, they work better together. Using the Speech Recognition Control to enter queries for Bing Maps or the Search API is an obvious choice, but you can also combine the control with Speech Synthesis for Windows 8.1 to enable two-way conversations with your users. Add the Translator API into the mix you could have real-time audio translation, just like in the sci-fi shows.
For more examples of what you can do with the Speech Recognition Control, go to the Speech page on the Bing Developer Center and check the links in the Samples section.
The control currently recognizes speech in US English.
Give Your Machine the Gift of Sight – Bing Optical Character Recognition in Six Additional Languages
From recognizing text in documents to the identification of email addresses, phone numbers, and URLs to the extraction of coupon codes, adding the power of sight to your Windows 8.1 Store app opens a host of new scenarios developers can enable to enhance their applications. Coming out of Customer Technical Preview, the Bing Optical Character Recognition Control for Windows 8.1 now supports six new languages: Russian, Swedish, Norwegian, Danish, Finnish and Chinese Simplified in addition to the existing language support for English, German, Spanish, Italian, French, and Portuguese.
Rise Above Language Barriers – Bing Translator Control for Windows8.1
Access robust, cloud-based, automatic translation easily between more than 40 languages with the Bing Translator Bing Translator Control, which moves from Customer Technical Preview to general availability today. The Translator Control gives you access to machine translation services built on over a decade of natural language research from Microsoft Research. After download and one-time authentication, you can simply place the Translator Control in your app, feed it a string to translate, and receive the translation.
The Bing Developer Team
Comments
- Anonymous
October 24, 2013
Could you give more examples of Speech Recognition Control ?