Solution - Exprienced Challenge 6: How Do You Spell That Again?
This was a challenge a good 15 or 20 years in the making. That’s not because it took us that long to come up with the challenge; even we work faster than that. It’s just that one of the OfficePalooza organizers learned about the Soundex algorithm 15 or 20 years ago (yes, back when he was, uh, 5 or 6 years old) and has been waiting ever since to actually get a chance to use the Soundex algorithm. So does that mean OfficePalooza was created just so this organizer would finally have an excuse to use the Soundex algorithm? Well, don’t tell anyone this but: yes.
Hey, why else would you create something called OfficePalooza?
Having already waited 15 or 20 years, we can’t wait any longer. At long last, our Visual Basic for Applications subroutine that uses the Soundex algorithm to suggest replacements for a misspelled word:
Sub CheckSpelling()
For Each objWord In ActiveDocument.Words
If Len(objWord.Text) > 1 Then
strWord = UCase(Trim(objWord.Text))
Set objConnection = CreateObject("ADODB.Connection")
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" _
& ActiveDocument.Path & "\Soundex.accdb"
objRecordSet.Open "Select * From SoundexValues Where Word = '" & strWord & "'", _
objConnection
If objRecordSet.RecordCount = 0 Then
strPreviousLetter = ""
strNewWord = Left(strWord, 1)
For i = 2 To Len(strWord)
strLetter = Mid(strWord, i, 1)
If strLetter = "B" Or strLetter = "F" Or strLetter = "P" Or strLetter = "V" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "1"
End If
End If
If strLetter = "C" Or strLetter = "G" Or strLetter = "J" Or strLetter = "K" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "2"
End If
End If
If strLetter = "Q" Or strLetter = "S" Or strLetter = "X" Or strLetter = "Z" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "2"
End If
End If
If strLetter = "D" Or strLetter = "T" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "3"
End If
End If
If strLetter = "L" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "4"
End If
End If
If strLetter = "M" Or strLetter = "N" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "5"
End If
End If
If strLetter = "R" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "6"
End If
End If
strPreviousLetter = strLetter
Next
If Len(strNewWord) > 4 Then
strNewWord = Left(strNewWord, 4)
End If
Do Until Len(strNewWord) = 4
If Len(strNewWord) = 4 Then
Exit Do
Else
strNewWord = strNewWord & "0"
End If
Loop
objRecordSet.Close
objRecordSet.Open "Select * From SoundexValues Where Value = '" & strNewWord & "'", _
objConnection
Selection.EndKey Unit:=wdStory
ActiveDocument.ActiveWindow.Selection.TypeParagraph
Do Until objRecordSet.EOF
ActiveDocument.ActiveWindow.Selection.TypeText LCase(objRecordSet.Fields.Item("Word"))
ActiveDocument.ActiveWindow.Selection.TypeParagraph
objRecordSet.MoveNext
Loop
End If
objRecordSet.Close
objConnection.Close
End If
Next
End Sub
We don’t know about you, but it brings a tear to our eyes.
So what exactly did we do here? Well, we started off by setting up a For Each loop that loops through all the items in the active document’s Words collection; that’s what we do with this line of code:
For Each objWord In ActiveDocument.Words
As it turns out, there’s an interesting quirk with the words in the Words collection: you get not just the word, but the blank space that follows the word. (In fact, a blank space actually counts as a word, at least as far as the Words collection is concerned.) Because of that, we use this line of code (and the Len function) to count the number of characters in the word:
If Len(objWord.Text) > 1 Then
If the word has only one character we’re assuming it’s a blank space; therefore, we ignore it. If the word has more than one character then we use the following line of code to trim off any blank spaces (using the aptly-named Trim function) and then convert the word to all uppercase letters:
strWord = UCase(Trim(objWord.Text))
Now it’s time to determine whether or not the word in question is spelled correctly. To do that, we’ll simply query the database Soundex.accdb and see if the word can be found in the database; if it can, then it must be spelled correctly. To that end, the first thing we do is connect to the database:
Set objConnection = CreateObject("ADODB.Connection")
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" _
& ActiveDocument.Path & "\Soundex.accdb"
There’s nothing too special about these two lines of code. In line one we create an instance of the ADODB.Connection object, an object that lets us work with databases. In line two we use the Open method to open the database Soundex.accdb. The only tricky thing here? We need to specify the full path to the database. That means we need the folder path as well as the file name. Problem? Heck no: because the database is in the same folder as our Word document we can determine the complete path by referencing the Path property of the active document and then tacking on \Soundex.accdb.
Which, needless to say, is exactly what we did.
As soon as we’ve made our connection we then issue the following query, which returns a recordset containing all the words that happen to match our target word:
objRecordSet.Open "Select * From SoundexValues Where Word = '" & strWord & "'", _
objConnection
Our next step is to check the number of items in the recordset. If the RecordCount is equal to 0 that means that there are no items in the recordset; in turn, that means that the word is misspelled. (What if there is at least one item in the recordset? That means that the word is spelled correctly. That also means that our subroutine does nothing.)
Let’s assume, however, that our word is not spelled correctly. (A good assumption, seeing as how the word is not spelled correctly.) In that case, we need to calculate the Soundex value for the misspelled word. Before we do that, however, we set the value of the variable strPreviousLetter to an empty string; that’s what we do here:
strPreviousLetter = ""
Note. Why did we do that? We’ll explain that in a minute or two.
The Soundex algorithm we’re using always uses the first letter of the word as-is. Therefore, we use the Left function to grab the first letter of our misspelled word and store it in a variable named strNewWord:
strNewWord = Left(strWord, 1)
At the moment, that makes strNewWord equal to M.
That brings us to this line of code:
For i = 2 To Len(strWord)
What we’re doing here is setting up a For Next loop that runs from 2 to the number of characters in the misspelled word. Why do we start the loop at 2? You got it: because we’ve already snared the first letter in the word and stashed it in the variable strNewWord. Inside this loop, we use this line of code to grab the next letter in the misspelled word:
strLetter = Mid(strWord, i, 1)
Next up? A series of If-Then statements that replace the letter we just grabbed with its numeric equivalent. For example, this block of code checks to see if the letter in question is a B, an F, a P, or a V:
If strLetter = "B" Or strLetter = "F" Or strLetter = "P" Or strLetter = "V" Then
If strLetter <> strPreviousLetter Then
strNewWord = strNewWord & "1"
End If
End If
If we do have one of these letters we then check to see if the previous letter in the word matches the current letter. (In other words, are we dealing with back-to-back instances of a letter, such as the two m’s in command?) If we have duplicate letters we don’t do anything; that’s part of the algorithm. If we don’t have duplicate letters then we tack the value 1 into the end of strNewWord.
Etc., etc.
Note. As an alternative to these If-Then statements we could have used regular expressions to replace the letters in the word with their numeric equivalents; that’s actually something a couple of people did in their scripts. We thought about that, but because regular expressions can get a little complicated, we decided to use If-Then statements. We figured that, with that approach, the logic of what we were doing – and why we were doing it – would be easier to follow.
After we’ve run through all the possible replacements, we set the value of strPreviousLetter to the current letter, then go back to the top of the loop and repeat the process with the next character in the misspelled word.
When we finish with all our number-to-letter substitutions our next task is to make sure that our final Soundex value starts with a letter and is then followed by 3 numbers (and only 3 numbers). If our Soundex value has more than four characters that means we need to get rid of the “excess” values. That can be done by using the Left function to extract just the first four characters:
If Len(strNewWord) > 4 Then
strNewWord = Left(strNewWord, 4)
End If
And what if we have less than four characters? In that case, we need to add zeroes to the end of the Soundex value until we do have four characters. That’s what this block of code does:
Do Until Len(strNewWord) = 4
If Len(strNewWord) = 4 Then
Exit Do
Else
strNewWord = strNewWord & "0"
End If
Loop
When all is said and done, we should end up with a Soundex value equal to this: M214. And now that we have a Soundex value we can use this line of code to query the database and return a recordset consisting of all the words that have a Soundex value equal to M214:
objRecordSet.Open "Select * From SoundexValues Where Value = '" & strNewWord & "'", _
objConnection
After we return our recordset we use these two lines of code to move the cursor to the end of the document (that’s what the EndKey method and the wdStory constant are for) and then enter a blank line in the document:
Selection.EndKey Unit:=wdStory
ActiveDocument.ActiveWindow.Selection.TypeParagraph
All we have to do now is list all the possible corrections for our misspelled word. To do that we set up a Do Until loop that runs until we reach the end of the recordset. (That is, until the recordset’s EOF – end-of-file – property is true.) Inside that loop we use the TypeText method to enter the possible correction, then use the TypeParagraph method to move the cursor to the next line. After that we call the MoveNext method to move to the next record in the recordset.
The whole process looks like this:
Do Until objRecordSet.EOF
ActiveDocument.ActiveWindow.Selection.TypeText LCase(objRecordSet.Fields.Item("Word"))
ActiveDocument.ActiveWindow.Selection.TypeParagraph
objRecordSet.MoveNext
Loop
And, on-screen, you should see the following spelling suggestions:
machiavellian
make-believe
massively
misapplication
misapply
misplace
misspell
misspelling
misspelt
Note. Yes, some of these corrections might a bit … unexpected …. In fairness to the Soundex algorithm, however, we didn’t give you the entire algorithm, just a subset.
At that point we close our recordset and our database connection, and we – and Challenge number 6 – are done. Was it worth waiting 15 or 20 years for this moment? What do you think?