Regex 101 Discussion I6 - Remove font directives from HTML
Regex 101 Exercise I6 - Remove font directives from HTML
Remove all <font…> or </font> directives from an HTML string.
*****
I've decided to start linking my answers back to the original posts, since the answers given there are often as good or better than the one that I give.
The most obvious way to write this is:
<font.*>|</font>
That's pretty straightforward - match either a <font...>, or a </font>. But it's also wrong, since the ">" in the first part will match the last ">" in the string. We need the non-greedy qualifier:
<font.*?>|</font>
That does what we want it to do (assuming we use singleline and ignorecase options...)
Other ways of doing this showed up in the comments. Maurits suggested using 3 regexes, or a simple one:
</?font.*?>
I don't know whether I prefer that one over mine. It is shorter, though it's a bit harder for me to read the /? part.
Kbiel suggest a version without the non-greedy option:
</?font[^>]*>
which also works well, though I prefer the non-greedy version due to readability.
Comments
- Anonymous
June 08, 2009
PingBack from http://quickdietsite.info/story.php?id=4514