Speech Macro of the Day: Speech Dictionary
I get feedback from people, from time to time, that they'd like a more efficient way to add items to their speech dictionary. Although, there is a facility in Windows Speech Recognition already to do this, it's one word at a time, and it only allows you to record the pronunciation, not specify it yourself.
So ... What do I do when I have request like this? I make a new macro of the day. Thus, today's speech macro of the day: Speech Dictionary.wsrMac
First, if we're going to be messing around with the speech dictionary, it might be nice to see what's already in it... To do that, I made a command where I can say "Export the speech dictionary", and it'll export all the words/phrases that have been customized into a text file, and then launch that text file for me to take a look at. Here's the command:
<command>
<listenFor>Export ?the speech dictionary</listenFor>
<script language="VBScript">
<![CDATA[
fileName = "dictionary.txt"
Set lexToken = CreateObject("SAPI.SpObjectToken")
lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")
Set lex = lexToken.CreateInstance()
Set words = lex.GetWords(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.CreateTextFile(fileName, 1)For Each word in words
If (word.LangId = 1033) ThenSet prons = word.Pronunciations
If prons.Count = 0 Then
file.Write word.Word & vbCrLf
Else
For Each pron in prons
file.Write word.Word & "/"
If pron.PartOfSpeech = 61440 Then
file.Write "BLOCKED" & vbCrLf
Else
file.Write pron.Symbolic & vbCrLf
End If
Next
End IfEnd If
Next
file.Close
Application.Run(fileName)
]]>
</script>
</command>
As you can see, it uses a bunch of speech APIs that are already in Vista, that any application can take advantage of. The first 7 lines opens up the speech dictionary (aka User Lexicon), and also opens up a file (dictionary.txt) to stick the words in in a more human readable format.
Then, for each word that it finds, it checks the language, and if it's 1033 (which means US English), it'll output the word into the file. But to do that, it needs to see how many pronunciations are available for each word. If there are zero, it'll just output the word. If there are more than one, it'll output one line per pronunciation.
There's also a special case, where if the part of speech is "61440", it outputs the word "BLOCKED". "61440" is a special kind of part of speech that the underlying speech platform uses to tell the underlying speech engine, this word should be blocked and not recognized at any time. The "BLOCKED" convention is just one I made up for this macro.
After looping thru all the words, it'll close the file, and launch the text file that was created.
Here's a sample of what my speech dictionary looks like when I say "Export the speech dictionary":
Ima/BLOCKED
im/BLOCKED
Visual Studio/v ih zh uw l s t uw d iy ow
antidisestablishmentarianism/ae n t ay d ih s ih s t ae b l ih sh m ax n t eh r iy ax n ih z ax m
Rob Chambers
Itamar/ih d ae m aa r
Itamar/ih t ae m aa r
Zac Chambers
Nic Chambers
Bec Chambers
Jac Chambers
The first few I've pasted here are words that the system has learned thru adaptation that I don't actually want it to use. Sometimes when I say "I am a GPM at Microsoft", it thinks I'm saying Ima or im. Thus... The first time I saw that happen, I selected Ima, and blocked it from my dictionary. More on that in a second...
Then, you can see for Visual Studio, I've actually got a full pronunciation listed. I probably don't have to, but it's here to show you that you can either have pronunciations listed, or not. Having it added as a single unit, ensures that when I say it, it'll always be cased properly.
The next word, "antidisestablishmentarianism", is one of the longest words in the English language, but it's not included in the speech dictionary by default. My son, Zac, loves this word, so of course I have to have it in my speech dictionary.
Next, you can see my name is listed with no pronunciation. Since both my first and last name are also common words, I've added my name here as a single unit, so when I say "Rob Chambers", I again get the proper casing.
Next up, is Itamar, pronounced two different ways. One with a "d" sound and one with a "t" sound. This way, no matter how I end up saying it in a hurry, I can speak Itamar's name properly in email communication with Itamar. BTW ... If you don't know Itamar, you should check out his ms-speech forum on Yahoo! Groups. It's a great place to learn more about Microsoft Speech and speech recognition in general.
Next up are the names of my kids. I have 4 boys, and they have short forms of more traditional names for their first names, so that their initials actually are the same as their first names. I played a bit too many video games as a kid, and RLC wasn't that cool for initials, compared with other kids in my neighborhood. My kids initials are the same as their names. It throws their teachers for a loop at first, but ... Well ... What can I say... I like it. So do they.
OK, now that I've described the lines, let's talk about the format of the pronunciations for a minute. These pronunciations are an attempt at human readable form, but using the exact same form as the underlying speech platform. That brings me to the next command, "Show phonemes":
<command>
<listenFor>Show phonemes</listenFor>
<run command="https://msdn.microsoft.com/en-us/library/ms717239(VS.85).aspx"/>
</command>
Say it, and it'll take you to the page on MSDN that describes what the American English Phoneme Representation is for the Speech API.
OK, now we're getting to the fun part. Now, let's say you wanted to add a new word. The command I'm about to show you, will let you say "Add that to the speech dictionary", and it'll copy whatever word is selected in your document (using the Windows clipboard), and add it to the speech dictionary with no specific pronunciation.
When I originally wrote this set of commands, I had 4 different commands. One for adding words, one for removing words, one for blocking them, and one for unblocking them. I quickly saw that they were all identical, so I made one command that can do any one of those 4 operations. Here's what it looks like with it's helper listenForList:
<command>
<listenFor>[operationPhrase] ?for that ?from ?to the speech dictionary</listenFor>
<setTextFeedback>Speech Dictionary: {[operationPhrase]}</setTextFeedback>
<script language="VBScript">
<![CDATA[
' Get the "that" text from the curent application...
Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
that = Application.clipboardData.GetData("text")' Determine if we're adding prons, adding phrases, remove phrases, or blocking phrases
operation = "{[operation]}"
' If we're adding a pron, we'll need to use the recognizer, otherwise we'll just need the lexicon
If operation = "addpron" Then
Set recognizer = CreateObject("SAPI.SpSharedRecognizer")
Else
Set lexToken = CreateObject("SAPI.SpObjectToken")
lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")Set lex = lexToken.CreateInstance()
End If
' Keep track of how many words/phrases we added, and loop thru the lines in the "that"...
cWords = 0
lineStartPos = 1
Do' Find the next line break
lineSeperatorPos = InStr(lineStartPos, that, Chr(10))
if (lineSeperatorPos = 0) Then lineSeperatorPos = Len(that)' Find the text for that line
thisLine = Mid(that, lineStartPos, lineSeperatorPos - lineStartPos + 1)
lineStartPos = lineStartPos + Len(thisLine)' Trim off the CR/LF
if (Right(thisLine, 1) = Chr(10)) Then thisLine = Left(thisLine, Len(thisLine) - 1)
if (Right(thisLine, 1) = Chr(13)) Then thisLine = Left(thisLine, Len(thisLine) - 1)' If we have something to operate on
If (Len(Trim(thisLine)) > 0) Then' Determine if there's a pronuncation included
pronSeperatorPos = InStr(thisLine, "/")
If (pronSeperatorPos = 0) Then
' Perform the operation with no pronuncation
If operation = "addpron" Then Call recognizer.DisplayUI(65552, thisLine, "AddRemoveWord", thisLine)
If operation = "add" Then Call lex.AddPronunciation(thisLine, 1033, 0)
If operation = "remove" Then Call lex.RemovePronunciation(thisLine, 1033, 0)
If operation = "block" Then Call lex.AddPronunciation(thisLine, 1033, 61440)
Else
' Find the pronuncation and collapse it
word = Left(thisLine, pronSeperatorPos - 1)
pron = Right(thisLine, Len(thisLine) - pronSeperatorPos)
pron = CollapsePron(pron)
' Special case the "BLOCKED" pronuncation
partOfSpeech = 0
If pron="BLOCKED" Then
partOfSpeech = 61440
pron = ""
End If
' Perform the operation with the pronuncation (and just continue if there's an error)
On Error Resume Next
If operation = "addpron" Then Call recognizer.DisplayUI(65552, word, "AddRemoveWord", word)
If operation = "add" Then Call lex.AddPronunciation(word, 1033, partOfSpeech, pron)
If operation = "remove" Then Call lex.RemovePronunciation(word, 1033, partOfSpeech, pron)
If operation = "block" Then Call lex.AddPronunciation(word, 1033, 61440, pron)
On Error Goto 0
End IfcWords = cWords + 1
End If
Loop while lineStartPos < Len(that)
' Tell the user what we did...
If (cWords = 1) Then
If operation = "addpron" Then Call Application.Alert("Added pronunciation for " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "add" Then Call Application.Alert("Added " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "remove" Then Call Application.Alert("Removed " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "block" Then Call Application.Alert("Blocked " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
Else
If operation = "addpron" Then Call Application.Alert("Added pronunciations for " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "add" Then Call Application.Alert("Added " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "remove" Then Call Application.Alert("Removed " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "block" Then Call Application.Alert("Blocked " & cWords & " words/phrases!", "Speech Dictionary", 1)
End IfFunction CollapsePron(pron)
ret = ""
insideBrackets = vbFalse
For i = 1 to Len(pron)
If (Not insideBrackets) Then
If (Mid(pron, i, 1) = "[") Then
insideBrackets = vbTrue
ElseIf (Mid(pron, i, 1) <> "/") Then
ret = ret & Mid(pron, i, 1)
End If
ElseIf (Mid(pron, i, 1) = "]") Then
insideBrackets = vbFalse
End If
Next
CollapsePron = ret
End Function
]]>
</script>
</command>
and:
<listenForList name="operationPhrase" propname="operation">
<item propval="addpron">Add ?a pronunciation</item>
<item propval="addpron">Add ?a pron</item>
<item propval="add">Add</item>
<item propval="remove">Remove</item>
<item propval="block">Block</item>
<item propval="remove">Unblock</item>
</listenForList>
I'll leave the details on the specifics for an exercise for the readers. As a user of the macro, though, you can now say things like:
"Add a pronunciation for that from the speech dictionary",
"Add that to the speech dictionary",
"Remove that from the speech dictionary",
"Block that from the speech dictionary", and
"Unblock that from the speech dictionary"
Your selection will have to be a single word/phrase, or multiple words/phrases separated by line breaks. The word/phrases can also have a trailing pronunciation, similar in form to what you see in the output from "Export the speech dictionary".
"OK, but how can I generate those pronunciations myself?" Good question!
Use this command:
<command>
<listenFor>Sounds like [...]</listenFor>
<listenFor>Insert sounds like [...]</listenFor>
<script language="VBScript">
<![CDATA[Application.SetTextFeedback("Sounds like...")
Set pc = CreateObject("SAPI.SpPhoneConverter")
pc.LanguageId = 1033pron = "/"
firstElement = Result.PhraseInfo.Properties.Item(0).FirstElement
numberOfElements = Result.PhraseInfo.Properties.Item(0).NumberOfElementsFor i = 1 To numberOfElements
Set elem = Result.PhraseInfo.Elements.Item(firstElement + i - 1)
pron = pron & "[" & elem.LexicalForm & "]" & "/" & pc.IdToPhone(elem.Pronunciation) & " "
Next
Application.Wait(0.25)
Application.SetTextFeedback("Sounds like: " & pron)
Application.InsertText(pron)]]>
</script>
</command>
This will use dictation to allow you to say "Sounds like Visual Studio", and it'll output /[visual]/v ih zh uw l /[studio]/s t uw d iy ow. So, if you have a word that you're trying to add, you can use the built in pronunciations of other words that WSR already knows about, to cut and paste together your own pronunciation.
Another way to do it would be to select the word of phrase you wanted to build a pronunciation for, and saying "What's that sound like?", which is the final command we'll put into this macro:
<command>
<listenFor>What's that sound like</listenFor>
<listenFor>What does that sound like</listenFor>
<script language="VBScript">
<![CDATA[Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
that = Application.clipboardData.GetData("text")Application.EmulateRecognition("Go after that")
Application.EmulateRecognition("Insert sounds like " & that)]]>
</script>
</command>
That will copy the selection, move right after it, and then pretend you actually said it. For many words/phrases, this will work even if Windows Speech Recognition doesn't really know how to pronounce the word/phrase, because the system will make it's best guess on how to pronounce it just like it would if you were trying to click on that word on a web page with your voice.
OK ... Now here's another command that will make your phrases a little shorter if you're actually using the commands inside Notepad.exe with the dictionary.txt file open:
<command>
<appIsInForeground processName="notepad.exe" windowTitleContains="dictionary.txt"/>
<listenFor>[operationPhrase] ?for that</listenFor>
<emulateRecognition>{[operationPhrase]} that the speech dictionary</emulateRecognition>
</command>
This basically only works when notepad is in focus, and it's editing dictionary.txt (as it would be when you've just said "Export the speech dictionary". This will enable you to say simpler commands like:
"Add a pronunciation for that",
"Add that",
"Remove that",
"Block that", and
"Unblock that"
Here's the macro in complete form:
<speechMacros>
<!--
NOTE #1: The magic number 1033 represent en-us
NOTE #2: The magic number 6552 is a special hack to represent the desktop window handle (Validated on XP, and Vista)
NOTE #3: The magic number 61440 means that this "word/phrase" should be blocked-->
<command>
<listenFor>[operationPhrase] ?for that ?from ?to the speech dictionary</listenFor>
<setTextFeedback>Speech Dictionary: {[operationPhrase]}</setTextFeedback>
<script language="VBScript">
<![CDATA[
' Get the "that" text from the curent application...
Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
that = Application.clipboardData.GetData("text")' Determine if we're adding prons, adding phrases, remove phrases, or blocking phrases
operation = "{[operation]}"
' If we're adding a pron, we'll need to use the recognizer, otherwise we'll just need the lexicon
If operation = "addpron" Then
Set recognizer = CreateObject("SAPI.SpSharedRecognizer")
Else
Set lexToken = CreateObject("SAPI.SpObjectToken")
lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")Set lex = lexToken.CreateInstance()
End If
' Keep track of how many words/phrases we added, and loop thru the lines in the "that"...
cWords = 0
lineStartPos = 1
Do' Find the next line break
lineSeperatorPos = InStr(lineStartPos, that, Chr(10))
if (lineSeperatorPos = 0) Then lineSeperatorPos = Len(that)' Find the text for that line
thisLine = Mid(that, lineStartPos, lineSeperatorPos - lineStartPos + 1)
lineStartPos = lineStartPos + Len(thisLine)' Trim off the CR/LF
if (Right(thisLine, 1) = Chr(10)) Then thisLine = Left(thisLine, Len(thisLine) - 1)
if (Right(thisLine, 1) = Chr(13)) Then thisLine = Left(thisLine, Len(thisLine) - 1)' If we have something to operate on
If (Len(Trim(thisLine)) > 0) Then' Determine if there's a pronuncation included
pronSeperatorPos = InStr(thisLine, "/")
If (pronSeperatorPos = 0) Then
' Perform the operation with no pronuncation
If operation = "addpron" Then Call recognizer.DisplayUI(65552, thisLine, "AddRemoveWord", thisLine)
If operation = "add" Then Call lex.AddPronunciation(thisLine, 1033, 0)
If operation = "remove" Then Call lex.RemovePronunciation(thisLine, 1033, 0)
If operation = "block" Then Call lex.AddPronunciation(thisLine, 1033, 61440)
Else
' Find the pronuncation and collapse it
word = Left(thisLine, pronSeperatorPos - 1)
pron = Right(thisLine, Len(thisLine) - pronSeperatorPos)
pron = CollapsePron(pron)
' Special case the "BLOCKED" pronuncation
partOfSpeech = 0
If pron="BLOCKED" Then
partOfSpeech = 61440
pron = ""
End If
' Perform the operation with the pronuncation (and just continue if there's an error)
On Error Resume Next
If operation = "addpron" Then Call recognizer.DisplayUI(65552, word, "AddRemoveWord", word)
If operation = "add" Then Call lex.AddPronunciation(word, 1033, partOfSpeech, pron)
If operation = "remove" Then Call lex.RemovePronunciation(word, 1033, partOfSpeech, pron)
If operation = "block" Then Call lex.AddPronunciation(word, 1033, 61440, pron)
On Error Goto 0
End IfcWords = cWords + 1
End If
Loop while lineStartPos < Len(that)
' Tell the user what we did...
If (cWords = 1) Then
If operation = "addpron" Then Call Application.Alert("Added pronunciation for " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "add" Then Call Application.Alert("Added " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "remove" Then Call Application.Alert("Removed " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
If operation = "block" Then Call Application.Alert("Blocked " & Chr(34) & thisLine & Chr(34) & "!", "Speech Dictionary", 2)
Else
If operation = "addpron" Then Call Application.Alert("Added pronunciations for " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "add" Then Call Application.Alert("Added " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "remove" Then Call Application.Alert("Removed " & cWords & " words/phrases!", "Speech Dictionary", 1)
If operation = "block" Then Call Application.Alert("Blocked " & cWords & " words/phrases!", "Speech Dictionary", 1)
End IfFunction CollapsePron(pron)
ret = ""
insideBrackets = vbFalse
For i = 1 to Len(pron)
If (Not insideBrackets) Then
If (Mid(pron, i, 1) = "[") Then
insideBrackets = vbTrue
ElseIf (Mid(pron, i, 1) <> "/") Then
ret = ret & Mid(pron, i, 1)
End If
ElseIf (Mid(pron, i, 1) = "]") Then
insideBrackets = vbFalse
End If
Next
CollapsePron = ret
End Function
]]>
</script>
</command><command>
<listenFor>Export ?the speech dictionary</listenFor>
<script language="VBScript">
<![CDATA[
fileName = "dictionary.txt"
Set lexToken = CreateObject("SAPI.SpObjectToken")
lexToken.SetId("HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\CurrentUserLexicon")
Set lex = lexToken.CreateInstance()
Set words = lex.GetWords(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.CreateTextFile(fileName, 1)For Each word in words
If (word.LangId = 1033) ThenSet prons = word.Pronunciations
If prons.Count = 0 Then
file.Write word.Word & vbCrLf
Else
For Each pron in prons
file.Write word.Word & "/"
If pron.PartOfSpeech = 61440 Then
file.Write "BLOCKED" & vbCrLf
Else
file.Write pron.Symbolic & vbCrLf
End If
Next
End IfEnd If
Next
file.Close
Application.Run(fileName)
]]>
</script>
</command><command>
<listenFor>Sounds like [...]</listenFor>
<listenFor>Insert sounds like [...]</listenFor>
<script language="VBScript">
<![CDATA[Application.SetTextFeedback("Sounds like...")
Set pc = CreateObject("SAPI.SpPhoneConverter")
pc.LanguageId = 1033pron = "/"
firstElement = Result.PhraseInfo.Properties.Item(0).FirstElement
numberOfElements = Result.PhraseInfo.Properties.Item(0).NumberOfElementsFor i = 1 To numberOfElements
Set elem = Result.PhraseInfo.Elements.Item(firstElement + i - 1)
pron = pron & "[" & elem.LexicalForm & "]" & "/" & pc.IdToPhone(elem.Pronunciation) & " "
Next
Application.Wait(0.25)
Application.SetTextFeedback("Sounds like: " & pron)
Application.InsertText(pron)]]>
</script>
</command><command>
<listenFor>What's that sound like</listenFor>
<listenFor>What does that sound like</listenFor>
<script language="VBScript">
<![CDATA[Application.SendKeys("{250 WAIT}{{CTRL}}c{250 WAIT}")
that = Application.clipboardData.GetData("text")Application.EmulateRecognition("Go after that")
Application.EmulateRecognition("Insert sounds like " & that)]]>
</script>
</command><command>
<listenFor>Show phonemes</listenFor>
<run command="https://msdn.microsoft.com/en-us/library/ms717239(VS.85).aspx"/>
</command><command>
<appIsInForeground processName="notepad.exe" windowTitleContains="dictionary.txt"/>
<listenFor>[operationPhrase] ?for that</listenFor>
<emulateRecognition>{[operationPhrase]} that the speech dictionary</emulateRecognition>
</command><listenForList name="operationPhrase" propname="operation">
<item propval="addpron">Add ?a pronunciation</item>
<item propval="addpron">Add ?a pron</item>
<item propval="add">Add</item>
<item propval="remove">Remove</item>
<item propval="block">Block</item>
<item propval="remove">Unblock</item>
</listenForList></speechMacros>
That's it! I know this is a lot of script to digest, but if you don't really want to, don't! Just use the macro as is. Questions? Comments? Let us know!