Draft 1.3 of the Ecma Office Open XML formats standard

Wow, we finally have an updated draft of the Ecma Office Open XML formats standard! https://www.ecma-international.org/news/TC45_current_work/TC45-2006-50.htm I've been waiting for a long time to be able to share all the great work that's been going on in Ecma TC45, and it's so awesome that we have a new public draft. I can't wait to hear what everyone thinks. If you go to that site, you'll see three different downloads:

  1. Draft 1.3 of the spec - The big download is the spec itself in PDF form. It's about 25 megabytes and is around 4000 pages.
  2. Draft 1.3 of the spec in the Open XML format - Alternatively, you can download the .docx version of the spec. Once Beta 2 comes out, you can open it that way (although opening 4000 pages of content with beta software may be slightly problematic <g/>)
  3. Schemas - The schema files are also available for download. They are available in a ZIP file, that also contains an index.htm file that describes each xsd

We've been working really hard over the past 5 months bringing this standard along. There is still a lot of work to do, but you'll see pretty clearly that we've made a ton of progress over the initial submission from last year. We have weekly 2 hour phone conferences (they are actually at 6am my time which is not ideal <g/> ), as well as 3 day face to face meetings about every 2 months. The contributions from everyone has just been outstanding. It's so awesome to work with such a diverse group of people. While the initial submission was made by Microsoft, it's now completely in Ecma's control and we've had a lot of help from Apple, Barclays Capital, BP, The British Library, Essilor, Intel, Microsoft, NextPage, Novell, Statoil, and Toshiba.

***Note*** Remember that this is just a draft. Some sections of the spec are much further along than others, so keep that in mind while you are looking through the spec. If you are in an area that looks like there isn't much information, odds are we just haven't gotten to that yet.

While I'm sure we'll be able to spend the next several months talking about all this, some of the big things I wanted to point out are:

  1. Public feedback - While the Ecma organization is completely open and anyone can join, I understand that some people just aren't able to make that commitment. That's why I was really excited that we have a mechanism set up now so that anyone can give feedback on the spec: ecmatc45feedback@ecma-international.org
  2. Technical discussion - If you are looking for technical discussions around the formats, you can also go to the openxmldeveloper.org site where there is a forum for a wide range of technical issues for developers who want to implement the formats.
  3. Navigating the PDF - The PDF file was actually generated using Word 2007. Bring up the Bookmark pane and you can easily navigate through the document structure (it's over 4000 pages, so that helps a lot!). You will also notice that in the reference sections, you can easily navigate through element and type reference just by clicking on the section number next to the element or type's name.
  4. Spreadsheet Formulas - Check out 15.5 (starts on page 247). There are about 160 pages of content describing the formula syntax and about 360 different functions. You'll notice that there is still a ways to go, but this is already a huge amount of really useful information.
  5. Depth of documentation - I know we've said this a million times, but this is a huge project. Migrating all the existing Office documents into an Open XML format and then providing full documentation is a ton of work. Many people don't realize how large these applications are, and how much there really is to cover. If you want an example, download the spec and look a the documentation for the simple type "ST_Border" which starts on page 1617 (it's in the WordprocessingML reference section under simple types). That shows a list of almost 200 legacy border patterns that you can apply to objects in a Word document. Tristan Davis, the Word representative on the Technical Committee, had to wok on every single one of those and provide images so anyone else could reproduce them. He created almost 200 documents, took screenshots of each one, and then provided the description and image representation in the spec. This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features, even if it's the "Maple Muffins" border style (page 1643) :-)
    1. Want some more depth? - Check out section 14.5 starting on page 135

I'm so excited right now, I'm really rushing just to get this blog post out. I can't wait to hear from people about what kinds of questions they have, or what they hope to do with the formats. We've going to have a lot of fun over the coming months (especially once Beta 2 is out the door and everyone can start to experiment with the files). More information to come, but that's it for now.

-Brian

Comments

  • Anonymous
    May 18, 2006


    The Open XML document standard is progressing through [ECMA]. I think
    Microsoft using [ECMA] for...

  • Anonymous
    May 18, 2006
    Ummm.....what version numbering are you guys using here? How come this isn't version 0.x?

    When you get the first version of the spec finished, what version do you expect that to be?

    /me is confused.

  • Anonymous
    May 18, 2006
    Version 1.3 of the Ecma spec has been published as of today.&amp;nbsp; It's a huge document, with lots of...

  • Anonymous
    May 18, 2006
    Actually Adam, it's officially "WD 1.3" meaning is the 1.3 version of the working draft.
    Once it finished and approved by Ecma, it would be version 1.0 of the Ecma standard.
    From there it would to go ISO and if there are any changes suggested it would be version 1.x of the Ecma standard and 1.0 of the ISO standard.

    -Brian

  • Anonymous
    May 18, 2006
    It would be better if the page numbers matched up. Page 1 doesn't occur until ~93 pages in due to the contents, and it makes using the contents to look stuff up pretty hard. People surely aren't going to use this in dead tree format....

  • Anonymous
    May 18, 2006
    The comment has been removed

  • Anonymous
    May 18, 2006
     Tuesday's blog discussed Office 2007 performance, but I noticed that section properties for Word are still output at the end of a section according to Draft 1.3 of the ECMA docs.  A SAX parser would normally be great for performance because you can parse tags immediately as they are encountered, but the location of the section properties forces a consumer to read the entire section's content before knowing how to display the first page correctly.  This eliminates the benefit gained by the SAX parser since it delays the initial display of a document (especially for huge files that are composed of only one section).  Is there a reason why it was decided to put the section properties at the end of a section?  It seems like it would make more sense to start the section off with the section properties.

  • Anonymous
    May 18, 2006
    Adam, sorry about the confusion :-)
    Within the Ecma TC45, we have produced 3 versions of the working draft so far. We agreed last month that it would be great if we made the next draft public so that we could let people outside the committee see what we'd been up to. There is not requirement within Ecma for the committee to publish working drafts, but the members of TC45 decided that it would be worthwhile.
    Once Ecma approves it as a standard, then the working draft numbering won't really be relevant.

    JayV, that's actually a great question. There are a number of reasons for the location of the section properties. You are right that there would be advantages to putting them at the beginning of the section, but those benefits actually started to go away as you deal with the fact that a section break can be located in just about any location. I think a blog post specifically focused on sections would be worthwhile. I'll add it to the list (a list that seems to just keep getting bigger) :-)

    -Brian

  • Anonymous
    May 18, 2006
    Wow, 4000 pages and only 25 MB? Not too shabby.

    Concerning "Maple Muffins" and other border styles:

    Why are you "hard-coding" 200 specific enumerations by name, instead of simply providing a method to store a small image in the document that could be then programatically tiled to encircle the page?

    Sure, for some types, you'd need a "top" image separate from a "corner" or "left", but that still more flexible than hard-coding these types.

    You'd save about 60 pages of spec document!

    The tradeoff, of course, is that documents containing these types of borders would be slightly larger than in the current spec, but the flexibility advantage is significant.

    I guess the real question is: WHO USES THESE BORDERS?? And when was the last time someone used them? 1998?

    (I know, full fidelity, etc...)

  • Anonymous
    May 18, 2006
    You have me bustin' a gut, Brad :D

    I do wonder the same thing, but also have come to discover that there are some areas of any given specification, application, or any implementation of any "tion" what-so-ever that what you and I may find as dated fashion of folks who don't even understand what the word fashion means, much less how to implement it properly, to them is something they simply are not concerned with, nor are they with our opinions as well.  To them, they want to make sure there borders from 1998 look the same in 2098 and beyond.

    I agree... the choices of style can defy what seems like reasonable justification to wonder what on earth they were thinking... To them, they worry more about consistancy, and focus more on what they feel to be most important, which in most cases is content, but in some cases can, in fact, be the borders themselves that they just can't live without.

    Such is the wonderful ways of the world we call home :)

    That said, what my opinions might be, and what the reality actually is, are more than likely at odds with one another.  Sometimes I get lucky, but luck and I have a love/hate relationship, so when I do get lucky, I tend to bunker down for whatever storm is brewin' thats about to strike wherever it is I happen to be when it happens.

    As luck would have it... thats usually the way it is :)

  • Anonymous
    May 18, 2006
    :-) I hear you Brad. That's why I pointed it out as an example of how much work it is to document these things. There are so many features and we need to represent them all in XML, and document it all.

    We could have done as you suggest and just use resource files for them, but that would have actually been more work in the applications. I don't really think it was worth an investment to make the borders more extensible. We don't really have any customers asking us for that. If at some point there does appear to be a need for it though, I'm sure that TC45 could look at adding it into a future version of the standard.

    -Brian

  • Anonymous
    May 18, 2006
    Great news!  I've been waiting for this too, especially for checking out my favorite bits on the packaging.  So, now you know the kind of technogeek I am, I care more about the packaging concept and its reuse than all of those hairy details that content involves.  Heh.

    It would have been cool to Zip the PDF as was done with the initial Working Draft.  It makes for a nice comparison with the DOCX, though.

  • Anonymous
    May 18, 2006
    Hi Brian,

    first of thanks for the nice error detection in Word! I couldn't resist and tried opening the .docx in Word 2007 B1TR.
    Just flipping through the spec, I finally understand why other companies always never fully managed to duplicate the Office binary formats: there are just way too complicated to reverse engineer.
    I agree with the other poster that you should fix the page numbering so that the first page number starts on the very first page of the document and are not in the version that would be more correct for publishing purposes. It's annoying to not be able to just go to the table of contents, find what I want and type in that page number into Adobe.

    Patrick Schmid

  • Anonymous
    May 18, 2006
    Funny.  The *.docx is delivered via the ECMA web server as a Zip file type, or that is at least what IE6 wants to save it as.  I played along, looked at it with WinZip, then renamed it to .docx.

    I must admit, this is exciting.  The expansion of the schemas looks great too, so I guess I can't stay that rusty with XSD that much longer.

  • Anonymous
    May 18, 2006
    I'm with Brad. I've heard discussion, and M.'s comment about using MOOX in 2098 implies I'm not the only one, that some people are looking to use either MOOX or ODF as an archival format that will still be widely supported, at least for reading, for 100 years or so.

    You want to make everyone who supports reading MOOX implement those 200 specific borders for the next century? Is there even a single independent implentation that does that at the moment? AFAIK, the best alternative reader for Word documents at the moment is OpenOffice.org. How does that cope with those borders so far?

    M. said "To them, they want to make sure there borders from 1998 look the same in 2098 and beyond.".
    You said "We could have done as you suggest and just use resource files for them, but that would have actually been more work in the applications."

    I call baloney on M.'s statment as no-one uses MOOX at the moment, there are no legacy borders to support in it. And I call baloney on your statement as well, as for most applications written between now and then, adding specific support for these 200 borders would, IMO, be more work for less gain than adding support for a more generic border definition in the document. (As Brad suggested, allowing the document to specify images for "top left", "top right", etc..., and repeating images for "top", "left", ... in either SVG or raster (BMP, GIF, PNG) formats could well allow reproducability with scope for custom borders in the future)

    Yes, people have legacy documents, but none of them are in MOOX yet. The only legacy document types with "Maple Muffins" borders are those that happen to have been created with some previous version of Word or other. So, yes, Word 2007 needs to be able to read them. And, yes, Word 2007 probably needs to be able to save those borders somehow in MOOX, if only for MS's Marketing/PR purposes.

    But only Word 2007 (+?) needs to be able to do that conversion from legacy Word documents to MOOX with 100% fidelity. Having a more generic spec will be more useful and less work to all the other bits of office software written over the next 100 years by anyone who isn't Microsoft.

    For a format that's going to last a century, I don't think that copying Word '95's implementation of borders in a generic, standardised file format is necessarily a wise thing.


    In a previous post, you mentioned in answer to a question about who else had input into the spec process, that you had "partners and MVPs" having input into the spec. IMO, this is the kind of thing where you need other groups who are producing independent implementations to have input. Yes, having partners and MVPs who are making sure they can read the actual text out of MOOX, and manipulate it with XSLT, and put it onto a website, etc..., is really useful. But how many groups people have tried to re-implement this particular bit of spec? Have the OpenOffice.org folks, or anyone other than MS who are writing Office software, had a chance to say "This is a stupid thing to put into an application-neutral document format"?

  • Anonymous
    May 19, 2006
    The comment has been removed

  • Anonymous
    May 19, 2006
    The comment has been removed

  • Anonymous
    May 19, 2006
    Hey Patrick, I'll talk with the editor about the page number thing. You're right that there is a lot of content. That's why we really want to make sure that the spec is constructed in such a way that people can choose to just use parts of it if they want and not the whole thing. It's really up to the implementer to decide what level of support they want to build. They may choose to ignore things (like the borders for instance :-).

    Dennis, it looks like the ecma web server doesn't have the proper MIME type associated with the .docx extension. So as a result, the server just returns this:

    HTTP/1.1 200 OK
    Proxy-Connection: Keep-Alive
    Connection: Keep-Alive
    Content-Length: 6976940
    Via: 1.1 RED-PRXY-03
    Date: Thu, 18 May 2006 22:36:18 GMT
    Content-Type: text/plain; charset=ISO-8859-1
    ETag: "d708af-6a75ac-dafe3380"
    Server: Apache
    Last-Modified: Thu, 18 May 2006 13:17:18 GMT
    Accept-Ranges: bytes
    Keep-Alive: timeout=15, max=100
    Content-Language: en

    See how the content type is just "text/plain"? So your browser probably sniffs it to figure out what it is and determines that it's a ZIP :-)

    -Brian

  • Anonymous
    May 19, 2006
    "That shows a list of almost 200 legacy border patterns that you can apply to objects in a Word document. "

    "This format is 100% compatible with the existing base of Microsoft Office documents, so nobody will need to worry about losing features, even if it's the "Maple Muffins" border style (page 1643) :-)"

    As a format for Microsoft Office 2007 its probably fine and dandy to include all this legacy information. But seriously, didn´t you ever consider that this would make your open format overly tedious and complex? Did you you ever consider that as a standard, simplicity might be preferable, compared to 100% backwards compatibility?

    "3. In November we moved to a covenant not to sue, which essentially said that you can do whatever you want with our formats, as long as you don't try to sue us for what's in the format."

    To what extent does this include opensource use?

    I am not trying to be hostile, but I am honestly doubting the usefulness of this format as an open standard.

  • Anonymous
    May 19, 2006
    Hey Lars, no need to worry about sounding hostile. Those are great questions.

    I love XML, and I would have loved to have been able to start from scratch to build a new clean XML format without having to worry about backwards compatibility. That's the sacrifice you have to make though when you have an existing customer base. The reality is that there is no point in building a format if people aren't going to use it (I'm talking about end users, not developers). Look at SpreadsheetML from Office XP, it was pretty clean and verbose, but there was no way anyone would have used it as their default format. It didn't support all their features.
    If we didn't build a format that supported all the existing documents out there, than the majority of our customers would not want to use it. This would have made all the work of moving to XML pointless. We need to migrate all those existing documents and existing users into the new world, and we need to make it as easy as possible.
    What we are doing is really tough. It's been years and years of development and has stretched out over multiple releases. There were a few folks early on who believed we could get to this point, and now we're almost. Those of us who've gone through it all are finally starting to see the light at the end of the tunnel (and please no jokes about it being a train) :-)

    The covenant not to sue is non-discriminatory. The covenant currently applied to the Office 2003 XML schemas and we've also commited publicly to providing the same thing for the Ecma schemas. It says:

    "Microsoft irrevocably covenants that it will not seek to enforce any of its patent claims necessary to conform to the technical specifications for the Microsoft Office 2003 XML Reference Schemas posted at http://msdn.microsoft.com/office/understanding/xmloffice/default.aspx (the "Specifications") against those conforming parts of software products. This covenant shall not apply with respect to any person or entity that asserts, threatens or seeks at any time to enforce a patent right or rights against Microsoft or any of its affiliates relating to any conforming implementation of the Specifications.

    This statement is not an assurance either (i) that any of Microsoft's issued patent claims cover a conforming implementation of the Specifications or are enforceable, or (ii) that such an implementation would not infringe patents or other intellectual property rights of any third party.

    No other rights except those expressly stated in this covenant shall be deemed granted, waived or received by implication, or estoppel, or otherwise. In particular, no rights in the Microsoft Office product, including its features and capabilities, are hereby granted except as expressly set forth in the Specifications."

    There are already opensource solutoins out there today that support our XML formats. Go look for yourself. I know there has been a huge FUD campaign around our formats from some of the folks that have invested significantly in ODF, but hopefully you'll see that it's really just a bunch of misinformation. There have been a number of folks in the open source community that have looked at the CNS and been extremely positive. OpenOffice today already supports opening the Word 2003 XML format. Gnumeric has a prototype implementation of reading and writing SpreadsheetML.

    -Brian

  • Anonymous
    May 19, 2006
    The comment has been removed

  • Anonymous
    May 19, 2006
    Hey Adam, thanks for taking the time to get all your thoughts down. It definitely has helped me understand where you are coming from.

    It sounds like you understand that from our point of view, in order to use an XML format as the default format for Office it needs to be 100% compatible right? I think you're point is more that we should also have an optional format that is more basic and doesn't necessarily have 100% of the features covered. That smaller more basic format would then be the one that should be standardized. I think that's what you are saying.

    Based on your description, the format you desire sound a lot like HTML. HTML is a great format for basic interchange. It doesn't support everything that is present in an Office document, but as you said, that isn't always desirable. We've supported HTML for quite awhile, although we took the approach of trying to have our cake and eat it to when we attempted to make our HTML output support 100% of our features. The result was an HTML format that had a ton of extra stuff in it that many of the people who just wanted HTML didn't really care about (and it just got in the way).

    Our primary goal this release with the formats was not to try and re-implement HTML, but instead to move everyone over to using XML for all of their documents. Let's talk about the motivations for what we are doing with Open XML since that was the main point of your question:

    1. The reason we've spent the past 8 or so years moving out formats toward a default XML format is that we wanted to improve the value and significance of Office documents. We wanted Office documents to play an important role in business process where they couldn't before. We wanted to make it easier for developers to build solutions that produce and consume Office documents. There are other advantages too, but the main thing is that Office documents are much more valuable in just about every way when they are open and accessible.

    2. The reason we fully document them is the exact same. We need developers to understand how to program against them. Without the full documentation, then we don't achieve any of our goals I stated above. The only benefit would be that other Microsoft products could potentially interact with the documents better (like SQL or SharePoint), but that doesn't give us the broad exposure we want. That would be selling ourselves short. We want as many solutions/platforms/developers/products as possible to be able to work with our files.

    3. The reason we moved to the "Covenant not to sue" was that a number of people out there were concerned that our royalty free license approach wasn't compatible with open source licenses. Again, since the whole reason for opening the files was to broaden the scenarios and solutions where Office documents could play a role, we moved to the CNS so that we could integrate with that many more systems. Initially we'd thought the royalty free license just about covered it, but there was enough public concern out there that that we decided we needed to make it even more basic and straightforward. We committed to not enforce any of our IP in the formats against anyone, as long as they didn't try to enforce IP against us in the same area. No license needed, no attribution, we just made a legal commitment.

    4. The reason we've taken the formats to Ecma for standardization is that it appeared that a number potential solution builders were concerned that if we owned the formats and had full control, we could change them on a whim and break their solutions. We also had significant requests from governments who also wanted to make sure that the formats were standardized and no longer owned by Microsoft.  Long term archive-ability was really important and they wanted to know that even if Microsoft went away, there would still be access to the formats. We were already planning on fully documenting them, but the Ecma standardization process gave us the advantage of going through a well established formal process for ensuring that the formats are fully interoperable and fully documented. It's drawn a lot more attention to the documentation as well so I'm sure we'll get much better input, even from folks who aren't participating directly in the process.

    I hope that helps to clear it up a bit. It really is just as simple as that. Any application is free to implement as little or as much of the format as they wish. If you really want every application operating on a more limited set of features, that isn't as much of a format thing as an application thing. You would need to get every application to agree that it will not add any new features or functionality, and will disable any existing functionality that the other applications don't have. That wasn't our goal. Our goal was to open up all the existing documents out there, and then anyone who wants to build solutions around those formats is free to do so. In addition, anyone is free to innovate on top of the formats, as I believe there is still a lot of innovation to come. The formats are completely extensible, so if someone wants to use the formats (or parts of the formats) as a base and build on top of that, they can do so as well. They can even join Ecma if they want and propose to add those new extensions to the next version of the standard.

    -Brian

  • Anonymous
    May 19, 2006
    Adam, I'll lend you a hand and maybe the next time a round you will not miss a boat by a mile ;) Here: Office 2007 file format must persist all Office 2003 and earlier format features, but none of other suites or applications have to do the same. That means the standard has to be complex, yet there is no need to support all of it, not for read, not for write, - a reasonable subset will do.

    Ok, that was my best try and I'll let Brian correct it as he sees fit.

  • Anonymous
    May 20, 2006
    The comment has been removed

  • Anonymous
    May 20, 2006
    Why is MS sending it to ECMA? I would wager as a) to harness the power of unpaid (open-source) programmers and b) legitimation.

    a) Windows already contains a good deal of open-source code. When others do the work, it saves MS development time and money. If somebody invents a new use/feature for Open XML, MS will be free to bundle it into Office.

    b) Lots of people are jumping ship in favor of OpenOffice, often because of the purported superiority of "open" standards. (Unfortunately, they are confusing ideology with technology.) This is bad for MS Office users: the advantage they enjoyed because of their file format is evaporating (now I am in the minority with MS Word and not being able to read OO files.)

    In the end, I fear the new standards will be impossible to implement. Look at the much-simpler HTML: NO browser renders it 100% correctly. MS will get it right, however. For them, it's merely new file coding, not a feature set.

  • Anonymous
    May 20, 2006
    As nice as clean and lean spec would be, Microsoft Office would be out of business if it could not provide 100% upgrade fidelity with the new file formats. Imagine an organization deciding whether to upgrade their files to the new file format or not. If the people testing the upgrade (no serious administrator would take MS statement that the upgrade is flawless at face value without testing) discover just ONE single thing that didn't get upgraded correctly, that might be it for the new file format in that organization. After all, it is human nature to assume that if one thing doesn't work correctly, who is to say that something else, much more important, doesn't work either? Therefore Microsoft has to guarantee 100% fidelity.
    The standard might get bloated because of that, but as Brian pointed out, you don't have to implement it all. If you are implementing it, you can look at the standard and decide that going with 20 border styles you think are the most common is sufficient for your application. No one will force you to implement the other 180 (except maybe your user base).
    In the whole discussion about ODF and backwards compatibility for current binary formats from different Office programs, we should not be kidding ourselves about reality either. Whether we like it or not, the Office 97-2003 binary formats are the de facto standard worldwide for Office file formats. I don't have the numbers of Microsoft's competitors, but I am sure that the number of user who don't use MS Office pales in comparison to the 400 million MS Office users. I'd guess that probably 80-90% of all of the world's spreadsheets, presentations and documents are in the MS file formats. That the company which controls these file formats is offering a losless upgrade path for all these documents to a free, fully documented and open standard is in my opinion an effort from which we all can only benefit.
    If you hate Microsoft, then think about the following: One of the biggest reasons that is always brought forth to explain the dominance of MS Office is the file format issue. MS managed to establish a sizable market and then managed to keep (and grow it) by making it quite hard for others to compete with them, because their documents were never fully interchangable with MS Office. With these new XML formats, Microsoft is giving that competitive edge away. The competition between Office products is from now on no longer going to be about who implements the file format that most of the existing documents are in the best, but rather who offers the better features and usability. I think as consumers of Office products, we can only benefit from that shift in the competition.

  • Anonymous
    May 20, 2006
    The comment has been removed

  • Anonymous
    May 20, 2006
    Hi Adam,

    > Why is MS sending MOOX to Ecma/ISO?
    I think you already answered the question: The customers of MS are demanding it. Why are they demanding it?
    Simply put, by getting MOOX standardized, the file format won't be subject to the whim of a single company (Microsoft). By getting it standardized, the file format also won't be change frequently (every 2 years with a new Office version), because standards bodies move very slowly. By asking MS to send MOOX to Exma/ISO, MS's customers are demanding asking for predicatability and long-term security of their investments. By having a standard, business can calculate the return on investment they'll get from MOOX and hence can make a sound and stable business case why they should invest in it. What kind of investments are they going to make though? Let me first remind you that the main customer base for MS Office are businesses, not consumers.
    If you look at your typical company today, you'll see an assortment of IT systems that are to varying degrees integrated with each other. Payroll might be a linked with accounts payable/receivable. An ERP system might connect to all other systems, etc. Also, the company will try to have its IT as integrated as possible with its suppliers and customers. Why? Integrating IT systems, whether internal ones or external ones, saves money (less manual work e.g.), reduces errors and speeds up processes. So information integration is good business for businesses.
    However in this landscape of integrated systems, you have an island of crucial systems that just don't want to integrate: spreadsheets, presentations, documents. Lots of crucial business information is contained and maintained there. I remember a case at a client's where a Excel spreadsheets contained the detailed information about transactions, whereas the massive assortment of multi-million dollar back-end systems (SAP was part of them) only contained dummy records for them. At the end of each month, the company had to spend quite some money to manually correct the balance sheet and income statement with the data from those Excel sheets (If I remember correctly, the corrections were ironically done in another Excel sheet...). The company had no other option, because the Excel sheets were not accessible to the back-end systems.
    Now picture the situation with an open XML format. Suddenly, all those isolated islands of presentations, spreadsheets and documents can be integrated into the workflow processes of existing systems. Doing that will save a lot of money, and hence companies will want to do it, if they can be assured their investment won't be a waste of money. Hence an Ecma/ISO standard would go a long way for them (everyone still remembers that 97 had new binary file formats and the nightmare that created).
    So in my opinion, open file formats are good businesses for MS main customers. And what is good business for them, is definitely also good business for MS.

  • Anonymous
    May 20, 2006
    "the file format also won't be change frequently (every 2 years with a new Office version)"

    Which format are you talking about? Take Excel - Office 97, 2000, XP and 2003 all used the same primary format. This is 11 years window until Office 2007 comes out.

    I agree wholeheartedly with the rest of the comment.

  • Anonymous
    May 21, 2006
    Patrick: Thanks for the reply. Has certainly given me a couple of things to think about. Just a couple of points:

    "The standard might get bloated because of that, but as Brian pointed out, you don't have to implement it all. If you are implementing it, you can look at the standard and decide that going with 20 border styles you think are the most common is sufficient for your application."

    I don't think that's realistic. If you're looking to implement an Office application, people will be looking at how well it works with "the standard file format", and I don't think that "partial support" is going to cut it.

    If you've got an organisation producing, say, web sites and doing hosting, etc..., and you're looking to use MOOX as the document format for writing internal specs, authoring end-user documentation, passing round newsletters, etc., etc., etc.... you might want pretty heterogenous network. You business & accounting people may need to use windows, your PR, marketing & graphic design people may want to use Macs, and your server admins may want to use Linux on their desktops.

    A *standard" file format should allow everyone to do that, and to create and exchange documents on an equal footing.

    However, if OpenOffice & KWord only support 10% of the borders in the spec, because it's "optional", the suits could look at that and say "Only Word supports MOOX properly, therefore everyone must have Windows desktops." - which almost completely destroys the point of having a "standard" file format.


    "The standard might get bloated [...]"

    If creating a standardised file format, that people have said they want to be able to use for years to come, is not a good time to drop all the crufty corner-cases, when will be?

    Come on. As an industry, we've got about 20 years of experience with creating file formats. Lets use it. Over the last 10 years of MS Office, I'm sure there have been some things that got into the file formats that either turned out to just be pointless, or were special cases of something that we could do a better job of with hindsight. Make them more generic.

    Do you really want to have to support every unforunate decision made in the last 10 years for the next 50? Or, if we continue with never taking out the occasional misfeature, forever?

    Sometimes, you should break backwards compatibility in order to move forward more effectively. Why is now, during this change, when we're specifically looking forward to extensibility and interoperability, not a really good time to do so?

  • Anonymous
    May 21, 2006
    Hi Adam,

    from a technical point of view I totally agree. We really could afford breaking some backwards compatibility in order to get a better standard moving forward.
    From a user/business point of view, we can't afford it though. If, 5 years frm now, I need to send a Word document written in 1995 to someone using MOOX, all I want to do is hit a convert button and attach the file to an email. I just wouldn't want to check whether the file was converted correctly. I really just would want it to work. Every single Office user has been taught over the years that no matter what, his or her documents will look fine in any new version of Office. Microsoft essentially made a promise to its users on that, and this is a promise that can never be broken.
    I read a great article a while ago that argued that Microsoft is a synonym for never-ending backwards compatibility and that the company will do everything it can to ensure that for decades to come, even to its own detriment. The article talked about this in relation to Windows Vista and how it still supports running programs written for the very first version of the first operating system MS ever produced and how this huge backwards-compatibility effort was affecting Vista's (forward) development all across the board.
    I agree with you that if you are OpenOffice & KWord, you'll implement 100% of the standard (and I expect them to do so within a year). But, if you are just a software company writing stuff to integrate Office into some process workflow, you really can choose to implement only 10%, namely the needed 10%. If you were to count all the software that implemeneted MOOX and uses it 2 years from now, I'd think that the handful of Office products (there aren't really more out there) are number-wise a lot less than all the other implementations.

  • Anonymous
    May 21, 2006
    What Adam repeatedly misses is: for every wannabe Office in development there are thousands upon thousands of projects, many internal and of limited scope, which need to generate simple documents that Office can read; or load and modify libraries of existing documents; or simply load and tag a bunch of documents. In addition there are projects, take Gnumeric, that understand that they cannot provide all the functionality of their MSO counterpart and have different goals in mind. Open XML format will benefit them immensly.

  • Anonymous
    May 21, 2006
    Check the May 19 video: http://www.tedpattison.net/videos/

  • Anonymous
    May 22, 2006
    Wow, that was wierd. I actually had replied to one of your earlier questions Adam, but the comment was blocked. I just now realized that it never made it up there, so I went into the blog admin site and unblocked it.

    Patrick and Biff both did a great job of helping to explain a number of the "why" questions you had, but you should also check out my response from Friday night: http://blogs.msdn.com/brian_jones/archive/2006/05/18/601150.aspx#602305

    There really is a strong business reason for opening our formats and it has nothing to do with the recent politics. We've been working on this for a long time.

    -Brian

  • Anonymous
    May 22, 2006
    The comment has been removed

  • Anonymous
    May 23, 2006
    This is shaping up to be a pretty cool month. Last week we finally had the first public draft of Ecma's...

  • Anonymous
    May 24, 2006
    There has been a great overall reaction to the news last week of Ecma's first public draft for the Office...

  • Anonymous
    May 24, 2006
    The comment has been removed

  • Anonymous
    May 25, 2006

    I recently mentioned on this blog that the Ecma TC45 committee had released Working Draft 1.3 of the...

  • Anonymous
    May 26, 2006
    Ecma has published an updated draft of the spec for the Office Open XML Formats Standard. Here's a link...

  • Anonymous
    July 11, 2006
    There were a lot of great comments from last week's announcement about the creation of an open source...

  • Anonymous
    March 11, 2007
    PingBack from http://staff.newtelligence.com/sergeys/OfficeOpenXMLSpecsDraft13.aspx

  • Anonymous
    June 01, 2009
    PingBack from http://portablegreenhousesite.info/story.php?id=13935

  • Anonymous
    June 09, 2009
    PingBack from http://insomniacuresite.info/story.php?id=5077

  • Anonymous
    June 18, 2009
    PingBack from http://homelightingconcept.info/story.php?id=2683

  • Anonymous
    June 19, 2009
    PingBack from http://debtsolutionsnow.info/story.php?id=8164