Word Blog HTML Quality

Every post you make as a blogger can be a huge learning experience. In Friday's post on the new Word 2007 blog post authoring feature I made a fairly modest claim that the HTML emitted by the feature would be better than the standard HTML from previous versions of Word. Well, it is, but I should have looked at the source code before I told everyone else to do so. There were a few problems:

  • Blog service vagaries

There were a number of issues introduced by Community Server (the blog system on which blogs.msdn.com is built). The upper case tags are the major example of that problem. Also, a number of issues were pointed out around the template for the site. The template is one of several standard CS template and, of course, had nothing to do with the HTML emitted by the blog feature (ID attributes with strange values are an example of this).

  • My stupid HTML mistakes

As I stated, I hand coded the image tags and I made a couple stupid HTML coding errors. Luckily I will not be shipped in the box with the feature and our developers will output it correctly.

  • Real problems we need to address

Also, several people made great suggestions for improvements that we want to look into. An example of this is needing to use <del> for strikethrough and the need to have proper tag content flow.

Goals

Most importantly I'd like to lay out the goals for the HTML output.

  • We will hand off valid XHTML for each post

We can't be held responsible for what Blogger, Spaces or anyone else does to the XHTML after we give it to them, but we'll send it to them as valid XHTML.

  • Clean HTML is more important than visual fidelity

This is a huge change for Word. Our focus has always been ensuring (as much as possible) that the HTML we output would result in full round trip of all the content and formatting in your document. The blog feature is all about representing what we can in a clean way without any special action/decision point on the part of the post author.

Suggestions welcome

With these two goals in mind, I would like to announce that we will post our the details of our XHTML output for public comment. The manner in which we do this (blog, discussion list, wiki, or something else) will be announced early next week. We can't promise that we will respond to all suggestions, but we will seriously consider them.

Comments

  • Anonymous
    May 16, 2006
    Joe -

    In my opinion you should make this blogging work with Sharepoint first. Sort out all issues there, and then publish a simple API for others to adopt.

    This would be a great opportunity to establish a SOAP friendly Web service API setup, that serves as a standard for blogs.

    Sahil
  • Anonymous
    May 16, 2006
    Gotta love the openness!  

    I don't know enough to have picked apart your code, as you invited, but I look forward to using the final product once you've got all kinds of excellent feedback from the community!
  • Anonymous
    May 16, 2006
    Community Server still messes around with the HTML even if you hand it over using the webservice APIs? I know that FreeTextBox mangles it until you don't recognize it anymore, but the webservices? That's weak.
  • Anonymous
    May 17, 2006
    PingBack from http://scobleizer.wordpress.com/2006/05/17/microsoft-word-generates-clean-html-for-blogs/
  • Anonymous
    May 17, 2006
    It may well be Community Server's wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly.
  • Anonymous
    May 17, 2006
    It may well be Community Server's wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly.
  • Anonymous
    May 17, 2006







    Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up...
  • Anonymous
    May 17, 2006
    In comments on your last I added a suggestion to consider using styles to drive the HTML export, I take your point about simplicity over formatting fidelity, but with a good, predicatable set of styles you can do a lot.

    http://ptsefton.com/blog/2006/05/13/beyond_blogging:_style-driven__html_export_from_2007._please.

    I'd love to know what you think of this idea. Are styles still there/usable in   the new Word?
  • Anonymous
    May 18, 2006
    This is good news but which version(s) of XHTML will Word render - 1.0 transitional, 1.1 strict or 2.0? Creating Microformats will be a doddle now.  I could write a template that non-techies could populate to generate hcard, hreview etc.  

    I for one think this is the strongest reason to upgrade my version of Outlook and Word so long as the metaweblog API and Atom publication support allow me to  blog to a variety of blogging tools and not just Spaces.

    I wonder if Microsoft could go further and fully support CSS2 for template formatting in Word and JavaScript 1.5+ for macros.
  • Anonymous
    May 19, 2006
    I'd just like to say, I was worried (understatement) that Word would try to output any kind of HTML at all. After reading your blog Joe, I'm feeling much better about MS and the people working there.

    I wish there was more people like Joe working for MS. ;)

    If you can do what you say, then don't stop saying it.
  • Anonymous
    May 22, 2006
    PingBack from http://vitorm.webhs.org/blog/?p=2212
  • Anonymous
    May 24, 2006
    PingBack from http://www.newsbit.com.br/blog/?p=108
  • Anonymous
    May 24, 2006
    PingBack from http://www.newsbit.com.br/blog/?p=110
  • Anonymous
    May 25, 2006
    PingBack from http://djmurdock.wordpress.com/2006/05/25/17/
  • Anonymous
    June 06, 2006
    One of the features in Word 2007 Beta 2 is the ability to author blog posts. Joe Friend announced the...
  • Anonymous
    August 01, 2006
    The comment has been removed
  • Anonymous
    November 22, 2006
    Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up on technical