4000 pages of documentation

There has been a great overall reaction to the news last week of Ecma's first public draft for the Office Open XML formats. One thing that is now absolutely clear to everyone that we are talking about an extremely rich and powerful set of file formats.

I think many folks didn't realize the amount of work we've had to take on, which explains why some had the false assumption that we could just use ODF. We were pretty clear in our response that it just wouldn't work for our customers because at the end of the day an open format is useless if the majority of our customers won't use it. That's why we had to make our formats fully support all the existing Microsoft Office files out there. If the formats didn't support all those features, then the only people who would use them are those that fundamentally want an open format; and everyone else would have just stuck with the old binaries. We absolutely did not want that to happen, we wanted everyone using an open format. We've invested a ton of resources into XML file formats because we believe it's a good thing, and we need to make sure that our customers will be willing to use them.

Let me be clear on a couple key points:

  1. Rich format - Yes the format is extremely rich and deep and that's because it represents a very powerful set of applications that have evolved over many years and many documents. It would have been completely unacceptable for us to create a format that didn't fully represent the existing base of Microsoft Office functionality. If we had created some kind of subset format, many in the industry would have complained for very legitimate reasons. People would have complained that we were destroying fidelity with the key features they used, that we were hiding functionality, not enabling everyone to exploit the rich features, not encouraging the move to XML, etc. Bottom line – millions of organizations would have had a legitimate problem.
  2. Extremely detailed documentation - It's funny but I've actually seen people complaining that there is too much documentation. The documentation is essential, even if there are parts that are not used by everyone. I personally think we have to provide documentation on every aspect of our format, otherwise how do you know what something means? This is a lot of work, and I believe it's absolutely necessary. I can't imagine there being a benefit to anyone from not documenting something.
  3. Full implementation - I don't think it should come as a surprise that with the rich set of features in Office, it's going to be a lot of work to build an application that can support all of that functionality. In the past, people had said that the reason nobody could build an application that matched Office was that the formats were locked up. Well, the format information was available, but not for all the many purposes that we are enabling now. Now all those people should be happy because the format information is complete enough to enable a full understanding by everyone. It's up to those other applications though to decide what level of support they want to build. While I think interoperability is possible, the struggles that the applications supporting ODF are having show that it's really a lot of work even for a format that isn't as deep. This is often to be expected though because the different applications have different sets of features as well as different implementations of the same features. That is how things work.
  4. Partial implementation - Now, if you don't care about fully matching Office features, then anyone can choose to just support a subset of the format. You can implement as much or as little of the format as you want. You can build an application that does nothing more than adds comments to a WordprocessingML document; or that automatically generates a rich SpreadsheetML file based on business data. It's up to you. The information is all there to use in the way that best benefits your application.
  5. Room for innovation - Now that all the features we've stored in our formats are fully open and documented, people are free to build with them. In addition to the fact that you can implement as much or as little of the format as you want, you are also free to add more to the format. The formats are completely extensible. You can add your own extensions to the format, or you can even join Ecma and propose that those extensions get added to the official Ecma standard. The strong support for custom defined schema in Office gives you a lot more power than what a document format on it's own would give you, through integration of your own parts.
  6. Microsoft does not own the standard - We no longer own these formats, Ecma does. I know there is still concern out there that these formats could change out from under you, but that's not something that Microsoft can do. Ecma fully controls it, and once it goes through ISO, it will be even more solid and locked down.

I'd also like to reuse some information that I left as a comment in my post last week. Some people were a bit confused on how you could create a standard that was so rich and had all the backward compatibility with the existing base of Microsoft Office documents. It was even suggested that almost as a way of leveling the playing field we should choose just a subset of features that we think everyone can build applications for. This would be a great move for our competitors but a horrible move for our customers. Adam provided a lot of feedback and I really appreciated that he took the time to write all that up. Patrick and Biff had some really great replies that tried to explain why backward compatibility was so important. Here is the reply I left for Adam that I hope helps really clear up his questions around why we went the standardization route in the first place:

Hey Adam, thanks for taking the time to get all your thoughts down. It definitely has helped me understand where you are coming from.

It sounds like you understand that from our point of view, in order to use an XML format as the *default* format for Office it needs to be 100% compatible right? I think you're point is more that we should also have an optional format that is more basic and doesn't necessarily have 100% of the features covered. That smaller more basic format would then be the one that should be standardized. I think that's what you are saying.

Based on your description, the format you desire sound a lot like HTML. HTML is a great format for basic interchange. It doesn't support everything that is present in an Office document, but as you said, that isn't always desirable. We've supported HTML for quite awhile, although we took the approach of trying to have our cake and eat it too when we attempted to make our HTML output support 100% of our features. The result was an HTML format that had a ton of extra stuff in it that many of the people who just wanted HTML didn't really care about (and it just got in the way).

Our primary goal this release with the formats was not to try and re-implement HTML, but instead to move everyone over to using XML for all of their documents. Let's talk about the motivations for what we are doing with Open XML since that was the main point of your question:

  1. The reason we've spent the past 8 or so years moving out formats toward a default XML format is that we wanted to improve the value and significance of Office documents. We wanted Office documents to play an important role in business process where they couldn't before. We wanted to make it easier for developers to build solutions that produce and consume Office documents. There are other advantages too, but the main thing is that Office documents are much more valuable in just about every way when they are open and accessible.
  2. The reason we fully document them is the exact same. We need developers to understand how to program against them. Without the full documentation, then we don't achieve any of our goals I stated above. The only benefit would be that other Microsoft products could potentially interact with the documents better (like SQL or SharePoint), but that doesn't give us the broad exposure we want. That would be selling ourselves short. We want as many solutions/platforms/developers/products as possible to be able to work with our files.
  3. The reason we moved to the "Covenant not to sue" was that a number of people out there were concerned that our royalty free license approach wasn't compatible with open source licenses. Again, since the whole reason for opening the files was to broaden the scenarios and solutions where Office documents could play a role, we moved to the CNS so that we could integrate with that many more systems. Initially we'd thought the royalty free license just about covered it, but there was enough public concern out there that that we decided we needed to make it even more basic and straightforward. We committed to not enforce any of our IP in the formats against anyone, as long as they didn't try to enforce IP against us in the same area. No license needed, no attribution, we just made a legal commitment.
  4. The reason we've taken the formats to Ecma for standardization is that it appeared that a number potential solution builders were concerned that if we owned the formats and had full control, we could change them on a whim and break their solutions. We also had significant requests from governments who also wanted to make sure that the formats were standardized and no longer owned by Microsoft. Long term archive-ability was really important and they wanted to know that even if Microsoft went away, there would still be access to the formats. We were already planning on fully documenting them, but the Ecma standardization process gave us the advantage of going through a well established formal process for ensuring that the formats are fully interoperable and fully documented. It's drawn a lot more attention to the documentation as well so I'm sure we'll get much better input, even from folks who aren't participating directly in the process.

I hope that helps to clear it up a bit. It really is just as simple as that. Any application is free to implement as little or as much of the format as they wish. If you really want every application operating on a more limited set of features, that isn't as much of a format thing as an application thing. You would need to get every application to agree that it will not add any new features or functionality, and will disable any existing functionality that the other applications don't have. That wasn't our goal. Our goal was to open up all the existing documents out there, and then anyone who wants to build solutions around those formats is free to do so. In addition, anyone is free to innovate on top of the formats, as I believe there is still a lot of innovation to come. The formats are completely extensible, so if someone wants to use the formats (or parts of the formats) as a base and build on top of that, they can do so as well. They can even join Ecma if they want and propose to add those new extensions to the next version of the standard.

-Brian

Comments

  • Anonymous
    May 24, 2006
    I definitely concur. I have been waiting for beta 2 since I did not get in on beta 1.

    I am most interested in this next release Office primarily because of the open xml standard.

    I have been reading thru the ecma document and playing around with System.IO.Packaging and I am extremely impressed with the openness.

    I CAN NOW POPULATE CHART OBJECTS VIA A DATABASE EASILY!

    Oh, the Ecma doc is very very detailed. If you don't get it after reading it your blind.

    Extremely cool!!!
  • Anonymous
    May 24, 2006

    I'd like to add some fact checking to your prose. You have said countless times that the new format are all xml. It is simply not true. Example : it was brought up a couple of weeks ago in the openxmldeveloper.org website that if you password-protect your documents then Word/Excel/Powerpoint will actually create an OLE document, and encrypt it there.

    I'll repeat it, files will have the same extension that if they were zip-based openoffice xml packages, except that they are not.

    So much for xml.

    And, of course, since this behavior has nothing to do with System.IO.Packaging, it means that it is impossible to programmatically work with such documents.

    Any comment? Replying that "a password-document should not be programmatically" does not count, since the  document security should have nothing to do with the programmability of the document. Consider a scenario where you'd like to make a bulk update to those files before you share them across the enterprise.

  • Anonymous
    May 24, 2006
    Brian> "It's funny but I've actually seen people complaining that there is too much documentation. [...] I can't imagine there being a benefit to anyone from not documenting something."

    I think you have the wrong end of the stick there.

    I don't think people are complaining that there is too much documentation because they'd like things to be less well documented, or even not documented.

    Instead, I think people are complaining that there is too much documentation because they'd like the format to be less complex, such that less documentation would be required to describe all of it.
  • Anonymous
    May 24, 2006
    Adam, see my first point about why it has to be such a rich format.
  • Anonymous
    May 24, 2006
    Interesting that people complained that the format was too rich or too documented; where did you see those complaints?

    I think you're wrong saying ODF isn't as rich as OXML. ODF builds on a number of industry standards, so for example where OXML has to specify a load of packaging conventions, ODF just says "We use XLink". ODF's richness comes from building on existing industry standards.

    On your comment about Microsoft not owning the standard - uh, surely until it successfully goes through ISO it does? The mandate of Ecma is to create a format compatible with Microsoft Office.

    Or are you saying that the mandate of TC45 has changed?
  • Anonymous
    May 24, 2006
    Brian: I'm aware of why MS, and some of its customers, need the format to be that complex. (Mostly thanks to your patient explantations.)

    I was just trying to point out that a number of people (even, or maybe even especially, some technical ones) will be intimidated by that level of complexity, and that will be why they're asking for less documentation. Not because they want it under-documented, but because they'll want it less complex.

    But, hey, you've got enough customers that some of them will always want the exact opposite of what some of the others want, so you'll never be able to please all of them anyway. :)

    I really wasn't trying to start something there, I was just offering a possible explanation for people asking for less documentation, in the hope of reducing your bemusement levels.
  • Anonymous
    May 24, 2006
    The comment has been removed
  • Anonymous
    May 24, 2006
    <quote=Adam> "I don't think people are complaining that there is too much documentation because they'd like things to be less well documented, or even not documented.

    Instead, I think people are complaining that there is too much documentation because they'd like the format to be less complex, such that less documentation would be required to describe all of it. "</quote>


    In other words, Microsoft should eliminate (or at least deprecate) features so as to make it easier for less functional office suite vendors to compete.  LOL
  • Anonymous
    May 24, 2006
    Mike: how do you propose encrypting a document and keeping XML structure at the same time, with ROT-13?

    Brutus: bingo!

    Brian: in my opinion making a list of Open XML parts that are not defined in ODF standard is great idea. I'm sick and tired of people saying that you can do everything in ODF. No you cannot! And of what you can, at what price?
  • Anonymous
    May 24, 2006

    "how do you propose encrypting a document and keeping XML structure at the same time, with ROT-13? "

    I am the one supposed to answer this question? I thought we were supposed to buy a product which products 100% XML-based file formats. Ask Brian, thanks.

    If Brian cannot come up with a positive answer, then all the claims "100% xml", "full xml", "went entirely to xml", ... made since last year are lies. I know, it's the kind of little details that you'd rather slip through something. But that's those little details that are important to users, a lot of which are as savvy as you can imagine. Remember, users may only use 10% of the features, but they don't use the same 10%.

    If I were to be consistent and avoid the XML sand castle, I would definitely encrypt the content in a XML CDATA section. Doing so, it would not still be programmable without a supplement API able to let me appropriately consume the content, i.e. the API would require passing a password in order to read/write the content in clear. That would be a first step.

  • Anonymous
    May 24, 2006
    Brian,

    OpenDocument is well known to support  variety of languages, and the Japanese ISO member pointed out a couple of problems with the spec. (mostly to do with international URIs). I think they would have noticed if numbering was a problem. The guys in the middle-east were looking at it too.

    You're absolutely right about formulas; OpenDocument does not specify a syntax, and that is something the TC is working on. There is a wider problem here, though: formula syntax is something users know directly. Should OpenDocument do something new, or just what Lotus 1-2-3/Excel did/do? OXML has the luxury of only caring about compatibility with Office file formats; OpenDocument is designed to be widely compatible with all.

    We can both reel off things which are missing in the other spec. What is sad is that Microsoft didn't participate in the OpenDocument process, where you could have added whatever features you thought were missing! Had you gone that route, you'd have been working with an ISO standard by now ;)

    I know you like to keep branding OpenDocument as "vague", "hobbyist", etc. I really think you ought to give it the respect it deserves: it's been through a thorough standardisation process twice, and was created in an open industry process which involved a lot of people with huge expertise with office documents.

    Granted, Microsoft don't want to use it, fine. But, I'm not sure the comments you make are fair. The comments about it on the Microsoft website certainly aren't, they're not even factually correct.

    Biff: OpenDocument manages to encrypt documents using the strong Blowfish algorithm, yet doesn't resort to embedding OLE. It's definitely possible.
  • Anonymous
    May 25, 2006
    Alex, you said, "What is sad is that Microsoft didn't participate in the OpenDocument process, where you could have added whatever features you thought were missing!"

    I think ODF supporters as well as the members of the OASIS ODF committee who intimate the above sentiment are being disingenuous.  They know darn well that there's no way that Microsoft could (or even should) adopt ODF as their default format even if they wanted to.

    Here's what http://xml.openoffice.org/ says regarding ODF and OpenOffice.org:
    -----------------------------
    OpenOffice.org XML file format: "The OpenOffice.org XML file format is the native file format of OpenOffice.org 1.0. It has been replaced by the OASIS OpenDocument file format in OpenOffice.org 2.0."

    OASIS OpenDocument file format: "The OASIS OpenDocument file format is the native file format of OpenOffice.org 2.0. It is developed by a Technical Committee (TC) at OASIS. The OpenDocument format is based on the OpenOffice.org XML file format."
    ------------------------------

    So, ODF is based on OpenOffice.org's previous XML format.  ODF is not "nuetral" any more than OpenXML is.  ODF is simply the opened version of OO.o's previous XML format and OpenXML is the opened version of Microsoft's previous XML format.  ODF is not standing on any higher moral ground, contrary to the rhetoric of the ODF peanut gallery.  Indeed, it looks like OpenXML's ECMA standardization process has been much more rigorous than the OASIS ODF standardization process which was little more than tweaking OO.o's previous format and calling it good.  The ODF supporters even subtly acknowledge this with their claim that "Microsoft should've participated in ODF and added whatever they thought was missing", as that suggests that only minor tweaking would've been required/permitted.  ISO ratified the OASIS spec, but that ISO did this without there even being a standard syntax for spreadsheet formulats shows that ISO's standardization process was not rigorous in any way, shape, or form.  Make no mistake, ISO's rubberstamping of ODF as a standard doesn't change the fact that ODF is OO.o's format.  Not WordPerfect's, not Lotus's, not KOffice's, not AbiWord's, not Gnumeric's, and certainly not neutral.

    To ask that Microsoft participate in ODF discussions to add whatever they thought was needed means that Microsoft, who already had an XML format, should forcefeed their features into, not a neutral format, but into a competitor's format.  The problems with this should be apparent, but here are some:
    1. It's illogical to ask the Vendor A to adopt Vendor B's file format when Vendor A's suite has produced orders of magnitude more documents in the world than Vendor B's has.

    2. Microsoft has the burden of supporting billions of documents that have already been created using its formats, and cannot afford to risk ODF members vetoing Microsoft features that aren't present in OO.o.  Indeed, certain ODF members have an incentive to dupe Microsoft into supporting a format that breaks old Microsoft Office documents.

    3. Looking at it from OO.o's perspective, would they really want up to 4000 pages worth of features that they may or may not support to be placed into their format?  There's no way that OO.o would've allowed Microsoft to come in and overhaul OO.o's own format to suit the purposes of Microsoft, nor would I expect them to.


    Oh, and regarding your "Should OpenDocument do something new, or just what Lotus 1-2-3/Excel did/do? OXML has the luxury of only caring about compatibility with Office file formats; OpenDocument is designed to be widely compatible with all" statement regarding how formulas should be saved, since ODF is based on OO.o's previous XML format, they should've just did what OO.o already did.  The problem with that is that OO.o's spreadsheet is too woeful to use as a basis for a forumla format.  You say that "the TC is working on" a sytax for forumlas.  I thought this was already an ISO standard.  Looks more like a standard in progress.  That ISO simply let that go through without blinking an eye shows that ISO had no rigorous standarization process whatsoever.
  • Anonymous
    May 25, 2006
    Don> "ODF is simply the opened version of OO.o's previous XML format and OpenXML is the opened version of Microsoft's previous XML format.  ODF is not standing on any higher moral ground, contrary to the rhetoric of the ODF peanut gallery."

    Are you saying that OO.o's format wasn't designed as an implementation-neutral format to begin with, and no-one other than Sun and OO.o developers had suggestions accepted into the format?

    http://www.oasis-open.org/archives/opendocument-users/200512/msg00008.html

    Or are you saying that MOOX has not been specifically designed to support MS Word and all it's accumulated features, no matter how complex that makes it?


    Don> "Make no mistake, ISO's rubberstamping of ODF as a standard doesn't change the fact that ODF is OO.o's format.  Not WordPerfect's, not Lotus's, not KOffice's, not AbiWord's, not Gnumeric's, and certainly not neutral."

    Ahem:
    http://www.koffice.org/announcements/announce-1.4.php (KOffice has supported ODF since June 2005
    http://www.koffice.org/announcements/announce-1.5.php (KOffice now uses ODF as it's native format)


    I do generally agree with you though.

    MS probably couldn't have got every change they would need to get an iron-clad 100% upgrade guarantee for their customers through the ODF process.

    MS also can't accept anything less than this because of the number of customers they have that insist on 100% document upgrades.

    On top of that, the ODF format wouldn't have been any better, and those looking to build fully interoperable independent office applications wouldn't have been helped in the slightest if MS had got all the changes through that they'd have needed to.

    Based on that, I'm coming to the conclusion that MS attending OASIS to try to get all their changes into ODF (as half measures wouldn't be good enough), would have been a waste of everyone's time, and it's probably for the best that they didn't.


    As someone who works in a heterogenous environment though, I know which one I'll be looking to use in the future. MOOX just isn't right for me. I'll have to hope that the ODF plugins that 3rd parties are working on for Word will be good enough for anyone who needs to share documents with me.
  • Anonymous
    May 25, 2006
    Actually Adam, it is true that the ODF format is largely based on the XML format from OpenOffice, you can't really argue with that. There are differences, but for the most part the structures look to be almost identical.

    I know that there are multiple applications that "support" it, but from what I've seen there isn't any true implementation of ODF out there. Apparently even OpenOffice doesn't properly support it yet: http://permalink.gmane.org/gmane.comp.openoffice.devel.xml/2238

    In addition to that (as I mentioned above) there are a number of places where OpenOffice has already extended ODF (formulas and international numbering for example). If those extensions aren't in the ODF spec, then that would imply the spec is not yet complete as Don pointed out. It also leads you to wonder what happens when/if the spec is updated to support those extensions. Will Sun make sure that the spec matches the OpenOffice extensions? Or will OpenOffice have to change it's format again to match the spec? Or will they use extensibility mechanisms to always output both representations?

    Adam, I'd really like it if you gave OpenXML more of a look. As I said before, we are investing a lot into providing good tools to help developers work with the formats.

    -Brian
  • Anonymous
    May 25, 2006
    I've also heard others make the same statement that Alex did about the OpenDocument specification being designed to be widely compatible with all. But when I look in the appendix of the spec  entitled "E.1. Changes from "Open Office Specification 1.0 Committee Draft 1", there are a handful of changes, mainly corrections to documentation, namespace changes, and a couple extra sections added (9.5; 9.8; 13; and 11.2). That gives me the impression that this was pretty strongly focused on matching the original Open Office format. Even then though, there seem to be things like formulas that exist in Open Office but weren't even included in the draft. Maybe I'm missing something?

    Alex said that OpenDocument didn't do formula syntax because it was hard to choose between the different available options. Well that's what creating a standard is all about isn't it. You need to specify how everything should work so that people can use it. You also need to account for future innovation, and make sure it's extensible, but when you already have a feature that exists, it should be represented. Especially something as crucial as formulas.

    -Brian
  • Anonymous
    May 25, 2006
    <quote=Adam>
    Don> "Make no mistake, ISO's rubberstamping of ODF as a standard doesn't change the fact that ODF is OO.o's format.  Not WordPerfect's, not Lotus's, not KOffice's, not AbiWord's, not Gnumeric's, and certainly not neutral."

    Ahem:
    http://www.koffice.org/announcements/announce-1.4.php (KOffice has supported ODF since June 2005
    http://www.koffice.org/announcements/announce-1.5.php (KOffice now uses ODF as it's native format) </quote>

    You'll have to forgive me for indulging in rhetorical flair. :-)  Yes, I know that KOffice made ODF its default format and others will do the same, but my point is that anyone that adopts ODF is essentially adopting OO.o's format rather than some "ideal" created from scratch format.

    And yes, I know that OpenXML is a descendent of previous Microsoft XLM formats and that part of its raison d’être is to be an open XML format for Microsoft's documents rather than to be an "ideal" created from scratch.  It's right there in the charter.  Microsoft never claimed otherwise.
  • Anonymous
    May 25, 2006
    I can't comment on the section you're talking about, but the OASIS FAQ on the subject says over 100 non-editorial changes, and even the "couple of extra sections" you mention are actually huge features. XForms alone is a massive spec., and is a seriously powerful feature set that no suite, OpenOffice.org or even MS Office, really harnesses right now without resorting to macro programming.

    The formulas thing wasn't about avoiding setting a standard. For one thing, it's not even clear that it should be part of this standard: the route OXML has taken, of specifying the syntax in the standard, means that you need two separate validation tools if you want to check a document, because it's not (and shouldn't be) an XML syntax. It's a valid spec. choice, but not necessarily the only choice.

    But, you're right, it should be specified somewhere - the question is how. There's a good argument to say that OpenDocument should import the formula syntax from OXML, at least as an option, and I think you'd find many ODF users would support that. That's more or less the way it works right now in any event.
  • Anonymous
    May 25, 2006
    The comment has been removed
  • Anonymous
    May 25, 2006

    I recently mentioned on this blog that the Ecma TC45 committee had released Working Draft 1.3 of the...
  • Anonymous
    May 25, 2006
    The comment has been removed
  • Anonymous
    May 26, 2006
    Ecma has published an updated draft of the spec for the Office Open XML Formats Standard. Here's a link...
  • Anonymous
    June 01, 2006
    I've had a lot of folks ask me to provide more information on what features are missing from ODF and...
  • Anonymous
    June 05, 2006
    As we move forward with the standardization of the Office Open XML formats, it's interesting to look...