Jaa


Solution - Exprienced Challenge 6: How Do You Spell That Again?

This was a challenge a good 15 or 20 years in the making. That’s not because it took us that long to come up with the challenge; even we work faster than that. It’s just that one of the OfficePalooza organizers learned about the Soundex algorithm 15 or 20 years ago (yes, back when he was, uh, 5 or 6 years old) and has been waiting ever since to actually get a chance to use the Soundex algorithm. So does that mean OfficePalooza was created just so this organizer would finally have an excuse to use the Soundex algorithm? Well, don’t tell anyone this but: yes.

Hey, why else would you create something called OfficePalooza?

Having already waited 15 or 20 years, we can’t wait any longer. At long last, our Visual Basic for Applications subroutine that uses the Soundex algorithm to suggest replacements for a misspelled word:

Sub CheckSpelling()

    For Each objWord In ActiveDocument.Words

        If Len(objWord.Text) > 1 Then

            strWord = UCase(Trim(objWord.Text))

 

            Set objConnection = CreateObject("ADODB.Connection")

            objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" _

                & ActiveDocument.Path & "\Soundex.accdb"

 

            objRecordSet.Open "Select * From SoundexValues Where Word = '" & strWord & "'", _

                objConnection

   

            If objRecordSet.RecordCount = 0 Then

                strPreviousLetter = ""

                strNewWord = Left(strWord, 1)

 

                For i = 2 To Len(strWord)

                    strLetter = Mid(strWord, i, 1)

 

                    If strLetter = "B" Or strLetter = "F" Or strLetter = "P" Or strLetter = "V" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "1"

                        End If

                    End If

 

                    If strLetter = "C" Or strLetter = "G" Or strLetter = "J" Or strLetter = "K" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "2"

                        End If

                    End If

 

                    If strLetter = "Q" Or strLetter = "S" Or strLetter = "X" Or strLetter = "Z" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "2"

                        End If

                    End If

 

                    If strLetter = "D" Or strLetter = "T" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "3"

                        End If

                    End If

 

                    If strLetter = "L" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "4"

                        End If

                    End If

  

                    If strLetter = "M" Or strLetter = "N" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "5"

                        End If

                    End If

 

                    If strLetter = "R" Then

                        If strLetter <> strPreviousLetter Then

                            strNewWord = strNewWord & "6"

                        End If

                    End If

 

                    strPreviousLetter = strLetter

                Next

 

                If Len(strNewWord) > 4 Then

                    strNewWord = Left(strNewWord, 4)

                End If

 

                Do Until Len(strNewWord) = 4

                    If Len(strNewWord) = 4 Then

                        Exit Do

                    Else

                        strNewWord = strNewWord & "0"

                    End If

                Loop

 

                objRecordSet.Close

 

                objRecordSet.Open "Select * From SoundexValues Where Value = '" & strNewWord & "'", _

                    objConnection

 

                Selection.EndKey Unit:=wdStory

                ActiveDocument.ActiveWindow.Selection.TypeParagraph

   

                Do Until objRecordSet.EOF

                    ActiveDocument.ActiveWindow.Selection.TypeText LCase(objRecordSet.Fields.Item("Word"))

                    ActiveDocument.ActiveWindow.Selection.TypeParagraph

                    objRecordSet.MoveNext

                Loop

  

            End If

 

            objRecordSet.Close

            objConnection.Close

        End If

    Next

End Sub

 

We don’t know about you, but it brings a tear to our eyes.

So what exactly did we do here? Well, we started off by setting up a For Each loop that loops through all the items in the active document’s Words collection; that’s what we do with this line of code:

For Each objWord In ActiveDocument.Words

As it turns out, there’s an interesting quirk with the words in the Words collection: you get not just the word, but the blank space that follows the word. (In fact, a blank space actually counts as a word, at least as far as the Words collection is concerned.) Because of that, we use this line of code (and the Len function) to count the number of characters in the word:

If Len(objWord.Text) > 1 Then

If the word has only one character we’re assuming it’s a blank space; therefore, we ignore it. If the word has more than one character then we use the following line of code to trim off any blank spaces (using the aptly-named Trim function) and then convert the word to all uppercase letters:

strWord = UCase(Trim(objWord.Text))

Now it’s time to determine whether or not the word in question is spelled correctly. To do that, we’ll simply query the database Soundex.accdb and see if the word can be found in the database; if it can, then it must be spelled correctly. To that end, the first thing we do is connect to the database:

Set objConnection = CreateObject("ADODB.Connection")

objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" _

    & ActiveDocument.Path & "\Soundex.accdb"

 

There’s nothing too special about these two lines of code. In line one we create an instance of the ADODB.Connection object, an object that lets us work with databases. In line two we use the Open method to open the database Soundex.accdb. The only tricky thing here? We need to specify the full path to the database. That means we need the folder path as well as the file name. Problem? Heck no: because the database is in the same folder as our Word document we can determine the complete path by referencing the Path property of the active document and then tacking on \Soundex.accdb.

Which, needless to say, is exactly what we did.

As soon as we’ve made our connection we then issue the following query, which returns a recordset containing all the words that happen to match our target word:

objRecordSet.Open "Select * From SoundexValues Where Word = '" & strWord & "'", _

    objConnection

 

Our next step is to check the number of items in the recordset. If the RecordCount is equal to 0 that means that there are no items in the recordset; in turn, that means that the word is misspelled. (What if there is at least one item in the recordset? That means that the word is spelled correctly. That also means that our subroutine does nothing.)

Let’s assume, however, that our word is not spelled correctly. (A good assumption, seeing as how the word is not spelled correctly.) In that case, we need to calculate the Soundex value for the misspelled word. Before we do that, however, we set the value of the variable strPreviousLetter to an empty string; that’s what we do here:

strPreviousLetter = ""

 

Note. Why did we do that? We’ll explain that in a minute or two.

The Soundex algorithm we’re using always uses the first letter of the word as-is. Therefore, we use the Left function to grab the first letter of our misspelled word and store it in a variable named strNewWord:

strNewWord = Left(strWord, 1)

 

At the moment, that makes strNewWord equal to M.

That brings us to this line of code:

For i = 2 To Len(strWord)

What we’re doing here is setting up a For Next loop that runs from 2 to the number of characters in the misspelled word. Why do we start the loop at 2? You got it: because we’ve already snared the first letter in the word and stashed it in the variable strNewWord. Inside this loop, we use this line of code to grab the next letter in the misspelled word:

strLetter = Mid(strWord, i, 1)

Next up? A series of If-Then statements that replace the letter we just grabbed with its numeric equivalent. For example, this block of code checks to see if the letter in question is a B, an F, a P, or a V:

If strLetter = "B" Or strLetter = "F" Or strLetter = "P" Or strLetter = "V" Then

    If strLetter <> strPreviousLetter Then

        strNewWord = strNewWord & "1"

    End If

End If

 

If we do have one of these letters we then check to see if the previous letter in the word matches the current letter. (In other words, are we dealing with back-to-back instances of a letter, such as the two m’s in command?) If we have duplicate letters we don’t do anything; that’s part of the algorithm. If we don’t have duplicate letters then we tack the value 1 into the end of strNewWord.

Etc., etc.

Note. As an alternative to these If-Then statements we could have used regular expressions to replace the letters in the word with their numeric equivalents; that’s actually something a couple of people did in their scripts. We thought about that, but because regular expressions can get a little complicated, we decided to use If-Then statements. We figured that, with that approach, the logic of what we were doing – and why we were doing it – would be easier to follow.

After we’ve run through all the possible replacements, we set the value of strPreviousLetter to the current letter, then go back to the top of the loop and repeat the process with the next character in the misspelled word.

When we finish with all our number-to-letter substitutions our next task is to make sure that our final Soundex value starts with a letter and is then followed by 3 numbers (and only 3 numbers). If our Soundex value has more than four characters that means we need to get rid of the “excess” values. That can be done by using the Left function to extract just the first four characters:

If Len(strNewWord) > 4 Then

    strNewWord = Left(strNewWord, 4)

End If

 

And what if we have less than four characters? In that case, we need to add zeroes to the end of the Soundex value until we do have four characters. That’s what this block of code does:

Do Until Len(strNewWord) = 4

    If Len(strNewWord) = 4 Then

        Exit Do

    Else

        strNewWord = strNewWord & "0"

    End If

Loop

When all is said and done, we should end up with a Soundex value equal to this: M214. And now that we have a Soundex value we can use this line of code to query the database and return a recordset consisting of all the words that have a Soundex value equal to M214:

objRecordSet.Open "Select * From SoundexValues Where Value = '" & strNewWord & "'", _

    objConnection

 

After we return our recordset we use these two lines of code to move the cursor to the end of the document (that’s what the EndKey method and the wdStory constant are for) and then enter a blank line in the document:

Selection.EndKey Unit:=wdStory

ActiveDocument.ActiveWindow.Selection.TypeParagraph

 

All we have to do now is list all the possible corrections for our misspelled word. To do that we set up a Do Until loop that runs until we reach the end of the recordset. (That is, until the recordset’s EOF – end-of-file – property is true.) Inside that loop we use the TypeText method to enter the possible correction, then use the TypeParagraph method to move the cursor to the next line. After that we call the MoveNext method to move to the next record in the recordset.

The whole process looks like this:

Do Until objRecordSet.EOF

    ActiveDocument.ActiveWindow.Selection.TypeText LCase(objRecordSet.Fields.Item("Word"))

    ActiveDocument.ActiveWindow.Selection.TypeParagraph

    objRecordSet.MoveNext

Loop

 

And, on-screen, you should see the following spelling suggestions:

machiavellian

make-believe

massively

misapplication

misapply

misplace

misspell

misspelling

misspelt

 

Note. Yes, some of these corrections might a bit … unexpected …. In fairness to the Soundex algorithm, however, we didn’t give you the entire algorithm, just a subset.

At that point we close our recordset and our database connection, and we – and Challenge number 6 – are done. Was it worth waiting 15 or 20 years for this moment? What do you think?