Feeds and well-formed XML

Here in Windows, we’re working hard on Windows Vista Beta 2, and we've recently been doing some work on how we parse feeds.

Our years of experience in with HTML in Internet Explorer have taught us the long-term pain that results from being too liberal with what you accept from others. Hence, we’ve adopted the following overriding principle for IE 7 and RSS platform in Windows Vista: 

   We will only support feeds that are well-formed XML.

This principle allows us to build a more predictable feed parser. As a platform, it's important that applications using the platform to consume feeds can rely on the fact that the platform will always be providing information in the way that the publisher intended (trying to guess what a publisher meant to do when there is an error in a feed can be tricky, at best). We also spoke to several people in the RSS and developer community at Gnomedex and at PDC, and they wholeheartedly supported this.

When viewing a feed that doesn’t validate as correct XML, IE7 will flag it (and highlight the error, just like we do today for generically bad XML feeds – so feed publishers can see what’s going on). When the platform downloads a feed with errors during regular updates, it will discard that update, and will try again at the next scheduled download (so feeds with temporary errors won't be permanently affected).

That said, we do recognize that there is a great deal of variance in the actual content of RSS feeds, so we’ll be more liberal when it comes to what elements are required in a feed. We will post on exactly how we're handling different feeds in a future post.

- Sean

Comments

  • Anonymous
    November 04, 2005
    This is the right thing to do, and I'm glad you're doing it - thanks.
  • Anonymous
    November 04, 2005
    Definitely the right thing to do!
  • Anonymous
    November 04, 2005
    Yes, ... we are amazed! Probably you are right, however ... Good luck folks
  • Anonymous
    November 04, 2005
    The comment has been removed
  • Anonymous
    November 04, 2005
    Why not well formed and well defined and just stick to Atom
  • Anonymous
    November 04, 2005
    Are you going to have any policy on the content markup - XHTML or tag soup? Singley, doubley or trebley escaped? Silent data loss?

    Check:
    http://www.intertwingly.net/blog/2004/05/28/detente

    http://www.xml.com/lpt/a/2003/08/20/embedded.html
  • Anonymous
    November 04, 2005
    Yes, if only Microsoft had this philosophy with IE4, we wouldn't have all the problems with IE we have today.

    Because Netscape would still be the only browser anyone used...
  • Anonymous
    November 04, 2005

    Sam Ruby thus spake:
    http://www.intertwingly.net/blog/2005/11/04/On-Notice
  • Anonymous
    November 04, 2005
    > Because Netscape would still be the only browser anyone used ...

    Maybe, we'd all be better off with Netscape being the only browser anyone used?

    No, sorry, just kidding, but honestly, will your 'well formed' claim hold for the parsing methods of your folks search engine as well? Probably not, 'cause then in 2009 Google'll still be the only searchengine anyone will use.

    Obeying Postel's law is a question of having the appropriate marketshare, hence standardization power ... If you haven't enough you'd better obey.

    Anyway, I love standard 'compliantnes'. So, go ahead.
  • Anonymous
    November 07, 2005
    Does this mean that you will follow RFC 3023 (i.e. XML served over HTTP) to the letter as well?
  • Anonymous
    November 07, 2005
    The problem with this is that I can see M$ creating their own standard of RSS, and everything else being "wrong"
  • Anonymous
    November 07, 2005
    This is like Ford declaring that their cars will only run on roads that do not contain any flaws. Nice in theory, but totally unrealistic.

    For me, it would result in me moving immediately to something that is more focused on letting me (the customer) do my task rather than being "right".

    I think it is legitimate to flag feeds that are not well-formed, but it is completely user-UNfriendly to discard them.

    And ironically, this is coming from a company that plans to keep some long standing CSS bugs in IE7. If you plan to stick 100% to the spec, you need to be consistent and do it with EVERYTHING.

    The first step is to educate. Let folks know just how many broken feeds there are. Then... maybe... in Vista, you can consider an OPTION to ignore feeds that are not well-formed.
  • Anonymous
    November 07, 2005
    This isn't like Ford declaring it will only run only on a perfect road. Its Ford declaring all cars are not ATVs.

    How many people actualy write there own RSS feed?
    Most RSS feeds are generated, ala feedburner, so this should be a non issue. If it is the content writer should be fixing this.
  • Anonymous
    November 07, 2005
    I find it hilarious, that you take this stance now. (it is almost the right stance)

    That said, if you are going to post articles like this, could you at least pretend that you know what you are talking about, by posting messages in a standard format?
    < A > (spaced to avoid deletion only) is not a valid XHTML tag. The < BR > tag is also wrong, and not self closing. Ditto for < IMG > tags, (PS I didn't find title attributes on them either) in fact, your whole RSS and ATOM Feeds for this darn blog, fail the guidelines you are trying to preach.

    OMG! What is this! (from your feed)
    "< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >< /FONT >"




  • Anonymous
    November 07, 2005
    How will you be dealing with instances where the spec is vague or inconclusive? For instance, feeds that have multiple enclosures per item?

  • Anonymous
    November 07, 2005
    And I suppose being too liberal in your support of other's work doesn't include png files?

    You've got a lot of work to do for IE7 if you want to continue to hold market share. RSS feeds are only a small part. A step in the right direct, granted. I'll be surprised if you can manage to pull it off. Tabbed browsing, close security issues, revamp the options to be easier to set, proper support for standards ... if you can get those, you'll be good. Just don't stop now.
  • Anonymous
    November 07, 2005
    If only MSFT understood the most important quote from all RFC's (793)

    The Internet Robustness Principle: "Be liberal in what you accept, and conservative in what you send."

    If only MSFT understood RFC's...

    If only..
  • Anonymous
    November 07, 2005
    I have heard a few bloggers complaining about this, I can only hope that Microsoft stays with WC3 standards and continues IE/WC3 compliancy. Dont get me wrong - I love what Microsoft has and is doing in the internet community, I just hope that the whole world will be able to benifit from this related work.
  • Anonymous
    November 07, 2005
    This is fantastic guys, a great move on Microsofts part. I hope the new IE7 will also follow other WC3 Standards aswell, it will make life a whole lot easier for all of us, not to mention be a big boost in how we typically think of Microsoft, this is definitely a good thing. Keep it up Guys, I'm really excited to see a Standards Compliant IE7.
  • Anonymous
    November 07, 2005
    The comment has been removed
  • Anonymous
    November 08, 2005
    The comment has been removed
  • Anonymous
    November 09, 2005
    As others have stated, I'm very concerned that Microsoft's definition (even though a w3c.org page was referenced) will differ from the rest of the world. It's been done too often in the past. Microsoft has never played nice with others. I sure hope you stick to your goals of standard compliance (at least in this one case)
  • Anonymous
    November 09, 2005
    The comment has been removed
  • Anonymous
    November 15, 2005
    Hurrah and well done. Postel's law is a very good law but has been totally bastardised by lazy coders.

    XML is useful because it works with standardised tools and in a predictable manner. If it doesn't do this then it's not useful.

    If there are x producers of XML in the world and y parsers then for each distinct error on the producrs part (Nx) there must be Yx fixes = Nx.Yx = a lot. If the producers mind their own shop then there are only Nx fixes required = a lot less = less time = less money/more features.

    Being free with what you accept is not a catch all clause. Requiring well formed XML is the least a parser should do. If not then where does it end - should an XML parser also include image recognition software in case <a href="http://neopoleon.com/blog/posts/434.aspx" title="Excel as a database">someone in marketing</a> is asked for an XML feed?
  • Anonymous
    November 17, 2005
    I just want to echo Nick Bradbury - why say more - he has all the cred anyone could ever need!
  • Anonymous
    November 22, 2005
    > We will only support feeds that are well-formed XML.

    Nice idea, product killer in practice. Your pain came from accommodating malformed input in the core code. Don't do that. Write a layer that consumes all manner of mangled gibberish and emits a well formed document. Have the product code see only this well formed document.
  • Anonymous
    November 29, 2005
    The comment has been removed
  • Anonymous
    December 01, 2005
    Hi