Speech Grammars in F#
People say that Vim keys are a grammar for talking to your editor and that's exactly what they are. One weekend some time back I had fun making VimSpeak to see how well mapping English words to Vim keys would work. It turned out quite nice and some pieces of how it was built (in particular the grammar description format) might be useful to others, so here's how it works. And here's a demo of VimSpeak in action:
[View:https://www.youtube.com/watch?v=qy84TYvXJbk]
If you want to peruse the code you can actually learn quite a bit about the grammar of Vim itself. You'll notice that it's a very declarative set of definitions. The API given by System.Speech.Recognition is very imperative and somewhat ugly, so I made this little helper:
open System
open System.Speech.Recognition
type GrammarAST<'a> =
____| Word ______of string * 'a option
____| Optional __of GrammarAST<'a>
____| Repeatable of GrammarAST<'a>
____| Sequence __of GrammarAST<'a> list
____| Choice ____of GrammarAST<'a> list
____| Dictation
let rec speechGrammar = function
____| Word (say, Some value) ->
________let g = new GrammarBuilder(say)
________g.Append(new SemanticResultValue(value.ToString()))
________g
____| Word (say, None) -> new GrammarBuilder(say)
____| Optional g -> new GrammarBuilder(speechGrammar g, 0, 1)
____| Repeatable g -> new GrammarBuilder(speechGrammar g, 1, Int32.MaxValue)
____| Sequence gs ->
________let builder = new GrammarBuilder()
________List.iter (fun g -> builder.Append(speechGrammar g)) gs
________builder
____| Choice cs -> new GrammarBuilder(new Choices(List.map speechGrammar cs |> Array.ofList))
____| Dictation ->
________let dict = new GrammarBuilder()
________dict.AppendDictation()
________let spelling = new GrammarBuilder()
________spelling.AppendDictation("spelling")
________new GrammarBuilder(new Choices(dict, spelling))
This lets you construct nice looking, declarative grammars from the discriminated union and then run them through the speechGrammar function to get GrammarBuilders used by System.Speech.Recognition.
You can have simple words and optionally associate them with some meaningful value. Restricted grammars are much more accurate to recognize than free dictation and spelling, but you can do that too. You can have optional bits of grammar, sequences of things that must be said in a particular order, choices from a set of options, etc.
A demo should make it clear enough. Lets start by letting someone introduce themselves. We could have a grammar listing choices of possible names, but here we'll just let them dictate their name. However the phrase preceding this is restricted to the grammar:
let name = Dictation
let intro =
____Sequence [
________Choice [
____________Word ("My name is", None)
____________Word ("I'm", None)]
________name]
This lets you say, "My name is Ashley" or "I'm Fred", etc. Let's let them say various greetings and goodbye phrases as well:
let greeting =
____Sequence [
________Choice [
____________Word ("Hello", Some "greeting")
____________Word ("Howdy", Some "greeting")
____________Word ("Hi", ___Some "greeting")]
________Optional name]
let goodbye =
____Sequence [
________Choice [
____________Word ("Goodbye", Some "goodbye")
____________Word ("See ya", _Some "goodbye")
____________Word ("Ciao", ___Some "goodbye")]
________Optional name]
Now we can say "Hello Joe", "Howdy", "See ya Mr. Bean", "Ciao", ... Notice now we're attaching a semantic value indicating whether it's a "greeting" or a "goodbye". This makes it easy (without parsing) to pull this information out of recognized phrases later.
We can create and initialize the speech reco engine:
let reco = new SpeechRecognitionEngine()
try reco.SetInputToDefaultAudioDevice()
with _ -> failwith "No default audio device! Plug in a microphone, man."
reco.LoadGrammar(new Grammar(speechGrammar greeting))
reco.LoadGrammar(new Grammar(speechGrammar intro))
reco.LoadGrammar(new Grammar(speechGrammar goodbye))
And for the heck of it, let's throw in some speech synthesis while we're at it:
open System.Speech.Synthesis
let synth = new SpeechSynthesizer()
synth.SelectVoiceByHints(VoiceGender.Female)
let speak (text : string) =
____reco.RecognizeAsyncStop()
____synth.Speak text |> ignore
____reco.RecognizeAsync(RecognizeMode.Multiple)
Funny enough, it is possible for the machine to talk to itself! This is why the speak function temporarily stops recognition.
Finally, we can do use use all this for a simple demo:
reco.SpeechRecognized.Add(fun a ->
____let res = a.Result
____if res <> null then
________printfn "%s (%f)" res.Text res.Confidence
________let sem = res.Semantics.Value
________if sem <> null then
____________match sem.ToString() with
____________| "greeting" -> speak "Hello there!"
____________| "goodbye" _-> speak "See you later!")
reco.RecognizeAsync(RecognizeMode.Multiple)
Console.ReadLine()
Here we just echo back what we think we heard and also speak back depending on the semantic value of what was said.
Take this and have some fun with it!
Comments
Anonymous
November 27, 2014
This is awesome!! But not sure how easy it is to port into Linux OS.Anonymous
December 29, 2014
@Raaz, I know... having switched to Ubuntu myself now, the speech APIs aren't there in Mono... :-/