Phone Number Word Games part 2 (by CalvinH)

As I said before in this blog, I have an optimized version of a program to solve a programming challenge

To recap, the challenge was to enumerate the possible phrases that can be obtained from a phone number, like 625-329-4741 is MakeAWish1

I understood the original challenge to be to solve the problem, without regard to efficiency. So, in a couple hours, I came up with a solution that wasn’t very efficient that I posted last week. Subsequently, I polished the optimized version below.

You can try this algorithm out by visiting this site and entering any phone number or string of letters. (Sorry, this site is internal to Microsoft only: I don’t want to bog down my tired old server, described in this blog)

It takes up to 15 seconds to calculate phrases based on a phone keypad and look them up in a 50,000 word dictionary.

For the original phone number of “642-394-6369” there were 20101 results calculated in about 10 seconds.

&& There are 3 parts to this algorithm.

&& First (GetDigSeqs) all the possible unique digit subsequences are enumerated into DigSeq.

&& For 123, this would be (123), (23), (3). For N digits, it would be maximum N sequences.

&& Part 2: (EnumNum) for each DigSeq, enumerate all possible letter words of length from 1 to len(DigSeq),

&& look them up in the dictionary, and place unique ones in SubWords

&& Part 3: (FormPhrase) For each digseq and, subword, assemble an N length phrase

&& EnumNum This part uses 2 parallel strings: the number to permute and a pattern

&& cPattern governs the pattern of letters to use for each digit's place

&& a "0" for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit

&& At the end, it'll be "1" (for 0,1), "3" (for 2,3,4,5,6,8), or "4" for "7,9", like "3334433143"

CLEAR

cNumber=INPUTBOX("Phone #?","Phone #","642-394-6369") && number of patterns is 4^8*5^2 = 1638400

dtStart=SECONDS()

GetPhrases(STRTRAN(cNumber,"-","")) && remove dashes

dtEnd=SECONDS()

?"#secs= ",dtEnd-dtStart,"num=",RECCOUNT("phrases")

#define MAXDIGITLEN 2 && max seq length of digits in final result

PROCEDURE GetPhrases(cNumber as String) && Given a digit sequence("6423946369") get all phrases

            PUBLIC NUMDIGS,ox as dictionary.dict

            NUMDIGS = LEN(cNumber)

            ox=CREATEOBJECT('dictionary.dict') && Instantiate the dictionary COM object

            ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word

            CREATE CURSOR DigSeqs (DigSeq char(NUMDIGS+1)) && all Digit subsequences

            INDEX ON DigSeq TAG DigSeq

            CREATE CURSOR phrases (phrase char(30)) && a table into which we can put all resulting phrases

            CREATE CURSOR SubWords (SubDigits char(NUMDIGS+1),SubWord char(NUMDIGS+1)) && All subwords found in dictionary

            INDEX ON SubDigits TAG SubDigits

            INDEX ON SubWord TAG SubWord

            SET ORDER TO 1

            SELECT SubWords

            GetDigSeqs(cNumber)

            FormPhrase(cNumber,"")

PROCEDURE GetDigSeqs(cNumber as String) && cNumber is all digis. Find the subsequences.

            LOCAL i,fShorterFound

            IF !SEEK(cNumber+' ',"DigSeqs") && if the number is not in the table

                        fShorterFound= SEEK(LEFT(cNumber,LEN(cNumber)-1)+' ') && Chop off 1 digit at the end and see if that's found

                        INSERT INTO DigSeqs VALUES (cNumber) && add the number into the table

                        IF !fShorterFound && if the shorter was not found,

                                    EnumNum(cNumber,REPLICATE("0",LEN(cNumber)),1) && enumerate all words, look in dictionary

                        ENDIF

            ENDIF

            IF LEN(cNumber) >1 && If we need to recur

                        GetDigSeqs(SUBSTR(cNumber,2)) && Recur with the string without the first digit

            ENDIF

            RETURN

PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators

            LOCAL i,cDigit,cPat,cStartLet,cLastPat,cNewLet

            cDigit=SUBSTR(cNumber,nDigNum,1) && the digit we're working on

            cPat=SUBSTR(cPattern,nDigNum,1) && where are we on this digit?

            cStartLet=SUBSTR(" adgjmptw",ASC(cDigit)-47,1) && Starting letter for this digit: 0 1 2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz

            cLastpat =SUBSTR("0033333434",ASC(cDigit)-47,1) && # permutations -1 for this digit: 0 0 4, 4, 4, 4, 4, 5, 4, 5

            DO WHILE .t.

                        IF !"0"$LEFT(cPattern,nDigNum)

                                    IF !SEEK(LEFT(cNumber,nDigNum)+' ',"SubWords",2) and ox.isword(LEFT(cNumber,nDigNum))

                                                INSERT INTO SubWords (SubDigits,SubWord) VALUES (LEFT(DigSeqs.DigSeq,nDigNum),LEFT(cNumber,nDigNum))

                                    ENDIF

                                    IF nDigNum < LEN(cNumber) && we haven't enumerated all digits yet

                                                EnumNum(cNumber,cPattern,nDigNum+1) && recur with next digit

                                    ENDIF

                        ENDIF

                        IF cPat=cLastPat && this digit has reached the end (for '7', we've done "7","p","q","r","s")

                                    EXIT

                        ENDIF

                        cPat = CHR(ASC(cPat)+1) && increment the pattern

                        cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1) && insert the new letter into the pattern

                        cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49) && from 'a' to 'b' or from 'n' to 'o'

                        cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1) && insert the new letter into the number

            ENDDO

            RETURN

PROCEDURE FormPhrase(cNumber as String,cPartResult as String) && cNumber is digit only sequence

            LOCAL i,nLen,nRec,cPartDigit

            nLen = LEN(cNumber)

            IF nLen = 0

                        INSERT INTO phrases VALUES (SUBSTR(cPartResult,2)) && if we've used all the digits, then we have a phrase. Remove the initial '-'

            ELSE

                        FOR i = 1 TO nLen

                                    cPartDigit=LEFT(cNumber,i) && the first i digits of the number

                                    IF i <=MAXDIGITLEN AND LEN(cPartResult)>0 AND ISALPHA(RIGHT(cPartResult,1)) && don't insert 2 digit sequences adjacent

                                                FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit) && insert the digits as a part result

                                    ENDIF

                                    IF LEFT(cPartDigit,1)$"01" AND i <= MAXDIGITLEN && if it starts with '0' or '1', it's ok

                                                FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit)

                                    ELSE

                                                SEEK cPartDigit+' ' && now find any subword associated with this digit sequence

                                                SCAN WHILE SubDigits=cPartDigit+' ' && for each subwords found (364 can be "dog" or "fog")

                                                            nRec=RECNO() && save/restore the current record

                                                            FormPhrase(SUBSTR(cNumber,i+1),cPartResult+'-'+ TRIM(SubWords.SubWord)) && recur with those subwords as partial results

                                                            GO nRec

                                                ENDSCAN

                                    ENDIF

                        ENDFOR

            ENDIF

RETURN

End of program

Comments

  • Anonymous
    August 03, 2004
    The link for me to try this out is broken. Would you tell me what it is?
    mlprobka@imfpubs.com
  • Anonymous
    August 03, 2004
    I just posted a new blog entry with a new link
    http://weblogs.asp.net/calvin_hsia/archive/2004/08/03/207286.aspx
  • Anonymous
    September 27, 2007
    Since July 4 th is nearing, I thought it would be appropriate to start my independent blog. In this first