Writing a Regular Expression parser in Haskell: Part 4

With the previous two modules in place we are now set up to use a DFA to match against a string.  In my implementation I support either a greedy match or an short match.  In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match. 

 

To do the matching I have a general function which will create a list of all matches.  Then the difference between short and greedy matching is which of the candidate solutions does it choose.

This is the method:

    1: doMatch func machine st [] = doAccept  machine st []
    2: doMatch func machine st string =  func $ map (\f -> doMatch' st f []) (tails string)
    3:     where
    4:       doMatch' state [] soFar = doAccept machine st soFar
    5:       doMatch' state (s:str) soFar = 
    6:           case findTransition machine s state of
    7:             Nothing -> doAccept machine state soFar
    8:             Just (from, to, val) -> case doMatch' to str (soFar ++ [s]) of
    9:                                       (False,_) -> case canAccept machine to of
   10:                                                     True -> (True, soFar ++ [s])
   11:                                                     False -> doMatch' to str (soFar ++ [s])
   12:                                       (True,res) -> (True,res)

 

This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.

For short or long matches I pass in one of these two functions:

    1: -- Get the shortest match
    2: shortest matches = case  filter (\s->fst s) (sort matches) of
    3:                      [] -> (False,"")
    4:                      ms -> head ms
    5:  
    6: -- Get the longest match
    7: longest matches = last.sort $ matches

 

I created aliases for the functions to make it more handy:

    1: (=~) = greedyMatch
    2: (=~?) = shortMatch

 

And then the final result:

    1: *SimpleRegex> "hiphiphiphorray" =~? "hip(hip)*"
    2: (True,"hip")
    3:  
    4: *SimpleRegex> "hiphiphiphorray" =~ "hip(hip)*"
    5: (True,"hiphiphip")

 

 

I attached a zip of all the files for this project.

Enjoy!

SimpleRegex.zip

Comments

  • Anonymous
    June 21, 2008
    The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have