Regex 101 Exercise I5 - Remove unapproved HTML tags from a string
Regex 101 Exercise I5 - Remove unapproved HTML tags from a string
When accepting HTML input from a user, allow the following tags:
<b>
</b>
<a href=…>
</a>
<i>
</i>
<u>
</u>
and remove any others.
Comments
- Anonymous
January 23, 2006
This one looks nice and challenging... - Anonymous
January 25, 2006
It seems this can do the trick:
<[^abiu>]+>
Sheva - Anonymous
January 25, 2006
But that lets through any tag that contains a, b, i, or u...
<script> (has i)
<p onload="..."> (has a) - Anonymous
January 26, 2006
(</?(?:u|i|b|as+href="[^">]")>)|</?[^>]>
Use:
Regex.Replace(InputString, "(</?(?:u|i|b|a\s+href="[^">]")>)|</?[^>]>", "$1") - Anonymous
January 26, 2006
kbiel that's close but it strips </a> tags (as I read it) - Anonymous
January 26, 2006
how about making the href part optional.
(</?(?:u|i|b|a(?:s+href="[^">]")?)>)|</?[^>]> - Anonymous
January 26, 2006
Hi, the correct pattern is:
</[^b]{1}[^>]> - Anonymous
January 26, 2006
Sorry, to keep the <b>, <a>, <i> and <u>, the patter is </?[^abiu/]{1}[^>]?>
:) - Anonymous
January 26, 2006
@Jeno
character negation is less expandable.
What if you want to expand to tags as <table> <td> <tr> etc etc ?
I think Kbiel's expression is the best until now ... - Anonymous
January 26, 2006
Ok, the regex to keep the <b>, <a>, <i>, <u>, <table> <td> <tr> are the following:
</?(?!a|b|i|u|table|tr|td|/)[^>]*>
This will keep all the properties like href, src, with, etc. - Anonymous
January 27, 2006
Close, but tags like <img> or <applet> are not excluded... - Anonymous
January 27, 2006
The comment has been removed - Anonymous
January 27, 2006
Here's what I made:
Regex anyTag = new Regex(@"<[/]{0,1}s*(?<tag>w*)s*(?<attr>.?=['""].?[""'])?s[/]{0,1}>");
Then I use a MatchEvaluator that uses two string[] containing the acceptable tags and attributes. - Anonymous
January 27, 2006
The comment has been removed - Anonymous
January 27, 2006
Sorry guys, I found a bug in my code, these are fixed versions:
</?(((?!a|b|i|u|table|tr|td|/)[^>])|([abiu][^s>]{1,}))>
and
</?(((?!a|b|i|u|table|tr|td|/)[^>])|((a|b|i|u|table|tr|td)[^s>]{1,}))> - Anonymous
January 27, 2006
Good catch Maurits. This will do it:
(</?(?:u|i|b|as+href="[^">]"|(?<=/)a)>)|</?[^>]>