Writing a Regular Expression parser in Haskell: Part 4
With the previous two modules in place we are now set up to use a DFA to match against a string. In my implementation I support either a greedy match or an short match. In a full featured regular expression engine this ability to choose greedy or not would be per operator but for simplicity I have it for the overall match.
To do the matching I have a general function which will create a list of all matches. Then the difference between short and greedy matching is which of the candidate solutions does it choose.
This is the method:
1: doMatch func machine st [] = doAccept machine st []
2: doMatch func machine st string = func $ map (\f -> doMatch' st f []) (tails string)
3: where
4: doMatch' state [] soFar = doAccept machine st soFar
5: doMatch' state (s:str) soFar =
6: case findTransition machine s state of
7: Nothing -> doAccept machine state soFar
8: Just (from, to, val) -> case doMatch' to str (soFar ++ [s]) of
9: (False,_) -> case canAccept machine to of
10: True -> (True, soFar ++ [s])
11: False -> doMatch' to str (soFar ++ [s])
12: (True,res) -> (True,res)
This creates the list of matches and uses the passed in function to determine how to filter to either the shortest or longest match.
For short or long matches I pass in one of these two functions:
1: -- Get the shortest match
2: shortest matches = case filter (\s->fst s) (sort matches) of
3: [] -> (False,"")
4: ms -> head ms
5:
6: -- Get the longest match
7: longest matches = last.sort $ matches
I created aliases for the functions to make it more handy:
1: (=~) = greedyMatch
2: (=~?) = shortMatch
And then the final result:
1: *SimpleRegex> "hiphiphiphorray" =~? "hip(hip)*"
2: (True,"hip")
3:
4: *SimpleRegex> "hiphiphiphorray" =~ "hip(hip)*"
5: (True,"hiphiphip")
I attached a zip of all the files for this project.
Enjoy!
Comments
- Anonymous
June 21, 2008
The third module in the simple regular expression parser is called: NFAtoDFA. Which as you might have