Regex 101 Exercise S3 - Validate a zip+4 zip code - Discussion
Exercise S3 - Validate a zip+4 zip code.
The US has a 5 digit zip code with an optional 4 digit suffix. Write a regex to validate that the input is in the proper format:
Sample strings
98008
98008-4893
****
This one is fairly similar to what we've done in the past. The most obvious way to match the first chunk of digits ("chunk" is a regex "term of art" that refers to a section of characters that you want to match (not really...)). We can do that with :
\d{5}
And we can easily match the second version with:
\d{5}-\d{4}
I got an email recently where the writer asked, "I've heard that it sometimes makes more sense to use two regexes rather than a single more complex one". Though crafting a single regex that covers all the cases can be an interesting intellectual exercise (a good idea if you want to avoid the heartbreak of flabby neurons), it sometimes makes more sense to cut your losses and simply use several regexes in sequence, and get out of work before happy hour is over.
Which is a long-winded way to say that I could just declare victory at this point, but that wouldn't be very educational (I desperately hoped to link to a .wav file of Daryl Hannah saying "edu cational" from Splash, but alas, repeated web searches proved fruitless). So, onward.
Regex provides an "or" option where you can match one of several things. To do that in this case, we would write:
^
(
\d{5}-\d{4} # zip + 4 format
| # or
\d{5} # standard zip format
)
$
which would match either of these. This is a reasonable way to write this match.
The final way is to use one of the quantifiers I discussed before. If we use the "?" quantifier, we can write:
^
\d{5} # 5 character zip code
(-\d{4})? # optional "+4" suffix
$
I think this would be my preferred solution.
Though we used parenthesis for grouping, they actually have other uses as well in regex. Tune in next week, where the word for the day will be "capture". Or, perhaps, "spongiferous".
What do you think of the series so far? If you've used regex before, it should seem simple to you. What would you change? What would you leave the same?
Comments
- Anonymous
November 11, 2005
To validate an input, i was using this pattern:
------------------------
^d{5}(?:-d{4})?$
------------------------
it is very similar to yours, but makes input like this:
99999-999999999 incorrect - Anonymous
November 11, 2005
Vladimit,
I don't understand your comment. The only difference between your regex and mine is that you used the non-capture "?:" inside the parenthesis.
Eric - Anonymous
November 11, 2005
I like the series. I have already read up on regexes fairly well, but for some reason I just can't deal with them too well. In my opinion no discussion on regex is too simple. - Anonymous
November 11, 2005
The comment has been removed - Anonymous
November 12, 2005
-----------------v------
^d{5}(?:-d{4})?$
-----------------^------
I saw this($) as the the key element in Vladimit regex.
Love the series. Can't wait for Search and Replace. - Anonymous
November 12, 2005
Great series. This is a nice way to gradually learn and relearn concepts that I haven't taken the time to fully understand.
What is the history of some of these patterns and symbols, e.g. why ^ and $? Are there pattern matching differences between .NET and other implementations, or only in way .NET classes provide for handling of searches, matches, etc.? - Anonymous
November 14, 2005
I'm enjoying the series. I've never really had to use regex before, though I can see where it will be useful in the future. - Anonymous
November 14, 2005
It's a great series. For those that don't have regular expression experience, it is very valuable.
I don't have experience of this kind myself, all I've done is read a book or two, and I am finding the classroom format (read, assignment, discussion) to be very helpful.