Word Blog HTML Quality
Every post you make as a blogger can be a huge learning experience. In Friday's post on the new Word 2007 blog post authoring feature I made a fairly modest claim that the HTML emitted by the feature would be better than the standard HTML from previous versions of Word. Well, it is, but I should have looked at the source code before I told everyone else to do so. There were a few problems:
- Blog service vagaries
There were a number of issues introduced by Community Server (the blog system on which blogs.msdn.com is built). The upper case tags are the major example of that problem. Also, a number of issues were pointed out around the template for the site. The template is one of several standard CS template and, of course, had nothing to do with the HTML emitted by the blog feature (ID attributes with strange values are an example of this).
- My stupid HTML mistakes
As I stated, I hand coded the image tags and I made a couple stupid HTML coding errors. Luckily I will not be shipped in the box with the feature and our developers will output it correctly.
- Real problems we need to address
Also, several people made great suggestions for improvements that we want to look into. An example of this is needing to use <del> for strikethrough and the need to have proper tag content flow.
Goals
Most importantly I'd like to lay out the goals for the HTML output.
- We will hand off valid XHTML for each post
We can't be held responsible for what Blogger, Spaces or anyone else does to the XHTML after we give it to them, but we'll send it to them as valid XHTML.
- Clean HTML is more important than visual fidelity
This is a huge change for Word. Our focus has always been ensuring (as much as possible) that the HTML we output would result in full round trip of all the content and formatting in your document. The blog feature is all about representing what we can in a clean way without any special action/decision point on the part of the post author.
Suggestions welcome
With these two goals in mind, I would like to announce that we will post our the details of our XHTML output for public comment. The manner in which we do this (blog, discussion list, wiki, or something else) will be announced early next week. We can't promise that we will respond to all suggestions, but we will seriously consider them.
Comments
- Anonymous
May 16, 2006
Joe -
In my opinion you should make this blogging work with Sharepoint first. Sort out all issues there, and then publish a simple API for others to adopt.
This would be a great opportunity to establish a SOAP friendly Web service API setup, that serves as a standard for blogs.
Sahil - Anonymous
May 16, 2006
Gotta love the openness!
I don't know enough to have picked apart your code, as you invited, but I look forward to using the final product once you've got all kinds of excellent feedback from the community! - Anonymous
May 16, 2006
Community Server still messes around with the HTML even if you hand it over using the webservice APIs? I know that FreeTextBox mangles it until you don't recognize it anymore, but the webservices? That's weak. - Anonymous
May 17, 2006
PingBack from http://scobleizer.wordpress.com/2006/05/17/microsoft-word-generates-clean-html-for-blogs/ - Anonymous
May 17, 2006
It may well be Community Server's wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly. - Anonymous
May 17, 2006
It may well be Community Server's wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly. - Anonymous
May 17, 2006
Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up... - Anonymous
May 17, 2006
In comments on your last I added a suggestion to consider using styles to drive the HTML export, I take your point about simplicity over formatting fidelity, but with a good, predicatable set of styles you can do a lot.
http://ptsefton.com/blog/2006/05/13/beyond_blogging:_style-driven__html_export_from_2007._please.
I'd love to know what you think of this idea. Are styles still there/usable in the new Word? - Anonymous
May 18, 2006
This is good news but which version(s) of XHTML will Word render - 1.0 transitional, 1.1 strict or 2.0? Creating Microformats will be a doddle now. I could write a template that non-techies could populate to generate hcard, hreview etc.
I for one think this is the strongest reason to upgrade my version of Outlook and Word so long as the metaweblog API and Atom publication support allow me to blog to a variety of blogging tools and not just Spaces.
I wonder if Microsoft could go further and fully support CSS2 for template formatting in Word and JavaScript 1.5+ for macros. - Anonymous
May 19, 2006
I'd just like to say, I was worried (understatement) that Word would try to output any kind of HTML at all. After reading your blog Joe, I'm feeling much better about MS and the people working there.
I wish there was more people like Joe working for MS. ;)
If you can do what you say, then don't stop saying it. - Anonymous
May 22, 2006
PingBack from http://vitorm.webhs.org/blog/?p=2212 - Anonymous
May 24, 2006
PingBack from http://www.newsbit.com.br/blog/?p=108 - Anonymous
May 24, 2006
PingBack from http://www.newsbit.com.br/blog/?p=110 - Anonymous
May 25, 2006
PingBack from http://djmurdock.wordpress.com/2006/05/25/17/ - Anonymous
June 06, 2006
One of the features in Word 2007 Beta 2 is the ability to author blog posts. Joe Friend announced the... - Anonymous
August 01, 2006
The comment has been removed - Anonymous
November 22, 2006
Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up on technical