Share via


Regex 101 Exercise I5 - Remove unapproved HTML tags from a string

Regex 101 Exercise I5 - Remove unapproved HTML tags from a string

When accepting HTML input from a user, allow the following tags:

<b>

</b>

<a href=…>

</a>

<i>

</i>

<u>

</u>

and remove any others.

Comments

  • Anonymous
    January 23, 2006
    This one looks nice and challenging...
  • Anonymous
    January 25, 2006
    It seems this can do the trick:
    <[^abiu>]+>

    Sheva
  • Anonymous
    January 25, 2006
    But that lets through any tag that contains a, b, i, or u...

    <script> (has i)

    <p onload="..."> (has a)
  • Anonymous
    January 26, 2006
    (</?(?:u|i|b|as+href="[^">]")>)|</?[^>]>

    Use:
    Regex.Replace(InputString, "(</?(?:u|i|b|a\s+href="[^">]")>)|</?[^>]>", "$1")
  • Anonymous
    January 26, 2006
    kbiel that's close but it strips </a> tags (as I read it)
  • Anonymous
    January 26, 2006
    how about making the href part optional.
    (</?(?:u|i|b|a(?:s+href="[^">]")?)>)|</?[^>]>

  • Anonymous
    January 26, 2006
    Hi, the correct pattern is:
    </[^b]{1}[^>]>



  • Anonymous
    January 26, 2006
    Sorry, to keep the <b>, <a>, <i> and <u>, the patter is </?[^abiu/]{1}[^>]?>

    :)
  • Anonymous
    January 26, 2006
    @Jeno
    character negation is less expandable.
    What if you want to expand to tags as <table> <td> <tr> etc etc ?

    I think Kbiel's expression is the best until now ...
  • Anonymous
    January 26, 2006
    Ok, the regex to keep the <b>, <a>, <i>, <u>, <table> <td> <tr> are the following:

    </?(?!a|b|i|u|table|tr|td|/)[^>]*>

    This will keep all the properties like href, src, with, etc.
  • Anonymous
    January 27, 2006
    Close, but tags like <img> or <applet> are not excluded...
  • Anonymous
    January 27, 2006
    The comment has been removed
  • Anonymous
    January 27, 2006
    Here's what I made:

    Regex anyTag = new Regex(@"<[/]{0,1}s*(?<tag>w*)s*(?<attr>.?=['""].?[""'])?s[/]{0,1}>");

    Then I use a MatchEvaluator that uses two string[] containing the acceptable tags and attributes.
  • Anonymous
    January 27, 2006
    The comment has been removed
  • Anonymous
    January 27, 2006
    Sorry guys, I found a bug in my code, these are fixed versions:
    </?(((?!a|b|i|u|table|tr|td|/)[^>])|([abiu][^s>]{1,}))>
    and
    </?(((?!a|b|i|u|table|tr|td|/)[^>])|((a|b|i|u|table|tr|td)[^s>]{1,}))>
  • Anonymous
    January 27, 2006
    Good catch Maurits. This will do it:

    (</?(?:u|i|b|as+href="[^">]"|(?<=/)a)>)|</?[^>]>