Поделиться через


' is in XML, in HTML use '

I just got hit by a very confusing "by design" behavior and it took me a while to figure out what's going on.

Here is the line of code:

     text = System.Security.SecurityElement.Escape(text);

This method replaces invalid XML characters in a string with their valid XML equivalent.

The problem that I had is that when escaping some VB code using this method and then pasting it into Windows Live Writer, VB comments ' became ' .

Well, it turns out, XML supports ' to denote the apostrophe symbol '. However HTML doesn't officially support ' and hence Live Writer "HTML-escaped" my already "XML-escaped" string.

Solution:

     text = System.Security.SecurityElement.Escape(text);
    // HTML doesn't support XML's '
    // need to use ' instead
    // https://www.w3.org/TR/html4/sgml/entities.html
    // https://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-October/004973.html
    // https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
    // https://fishbowl.pastiche.org/2003/07/01/the_curse_of_apos/
    // https://nedbatchelder.com/blog/200703/random_html_factoid_no_apos.html
    text = text.Replace("'", "'");

Comments

  • Anonymous
    March 18, 2010
    Did you consider using HttpUtility.HtmlEncode ? http://msdn.microsoft.com/en-us/library/system.web.httputility.htmlencode.aspx

  • Anonymous
    March 19, 2010
    Wow, that's a good one, didn't know about it! Thanks Thijs!

  • Anonymous
    February 09, 2011
    thanks!

  • Anonymous
    March 29, 2012
    Cheers.

  • Anonymous
    October 04, 2015
    '

  • Anonymous
    May 24, 2017
    being trying to setup my mailing list and this ' has been an issue... thanks alot for this