Where is the documentation for Office's docx/xlsx/pptx formats? Part 2: Office 2010
This follows up on my post earlier this week about where the documentation for Office 2007's formats live. I am powerless to make you do anything you don't want to, of course, but I'd strongly encourage you to read that one first as I'll be building on a lot of what I said there. I shall link it regularly during this article to instill a nagging feeling of guilt.
ECMA-376 and the Flavours of ISO/IEC 29500
Perhaps I'll use that as a band name. As I mentioned in my previous post, Office 2007 uses the file format specified in the first edition of the international standard ECMA-376 as its default file format - these are the xlsx/docx/pptx files that we know and love. Office 2010 uses another standard, ISO/IEC 29500, as its default file format. These also use the xlsx/docx/pptx extensions. But wait, you say! How the heck can I open files from Office 2010 in Office 2007 then?
ISO/IEC 29500 is a direct descendant of ECMA-376. It's so direct a descendant, in fact, that ECMA-376 2nd edition is identical to ISO/IEC 29500. This is useful information for implementers, particularly because Ecma distribute standards for free, so you can get a copy of ISO/IEC 29500 by downloading ECMA-376 2nd edition from the Ecma site.
ISO/IEC 29500 has two variants: "Strict" and "Transitional". Transitional ISO/IEC 29500 is almost identical to ECMA-376 first edition, and it is this that Office 2010 uses as its default file format. Because of the similarity between the standards, files created in Office 2010 can be opened without issues in Office 2007. Office 2010 is able to read the Strict variant of ISO/IEC 29500, but writes files using Transitional. We've announced that the next version of Office, version 15, will support the creation of Strict files, but that's a story for a future blog post.
ISO/IEC 29500
I covered the organisation of ECMA-376 in my earlier post, so I'd like to give ISO/IEC 29500 the same treatment. The standard is split into four parts:
Part | Title | Contents |
1 | Fundamentals And Markup Language Reference | The detailed reference of all that files may contain, and how to use them. This information pertains to both Strict and Transitional files. Roughly equivalent to ECMA-376 Parts 1, 3 and 4, but without the features that ended up in Transitional. |
2 | Open Packaging Conventions | How Office files are composed – how to break a file into its constituent parts, and how those parts can relate to one another. Equivalent to ECMA-376 Part 2. |
3 | Markup Compatibility and Extensibility | An inbuilt mechanism for adding arbitrary extensions to files. More on this shortly. Equivalent to ECMA-376 Part 5. |
4 | Transitional Migration Features | Read as an addendum to Part 1, this part defines Transitional files. By and large, Transitional features are added to Strict, so in order to fully understand Transitional files you must read Parts 1 and 4. Roughly equivalent to the parts of ECMA-376 Parts 1, 3 and 4 which didn't make it into Part 1. |
If you've ever tried to print out this standard (as I did by mistake once) you will notice that it's very large. It's not the thirty thousand pages that some telecom standards stretch to, but it's a lot of reading. As I mentioned in my previous post about Office 2007 file formats, I'd advise using the toolsets of others as much as you can rather than starting writing your read code from scratch. But, hey, maybe you like a challenge.
Office 2010 Extensions
Like any software developer, Microsoft are busy adding new features to Office with each new release, and for some features we need to find a way to store them in the file format. One of the great things about ISO/IEC 29500 is its extensibiltiy mechanisms - implementers can extend the file format while remaining 100% compliant with the standard. The bulk of the extension system is covered in Part 3, Markup Compatibility and Extensibility.
If you're not interested in Office 2010 features, you can use the mechanisms described in Part 3 to ignore them. If you do want to understand Office 2010's extensions, they're fully documented on MSDN. They're with all of Microsoft's other file format documents, at https://msdn.microsoft.com/en-us/library/cc313105(v=office.12).aspx. Depending on the type of files you're interested in reading, you'll want to read MS-DOCX, MS-XLSX, MS-PPTX or MS-ODRAWXML. Right now these are all up-to-date with Office 2010's content - when we eventually get done with the next version of Office, we will publish full documentation of those extensions at the same time as the product ships.
Implementer Notes
I mentioned Implementer Notes in my last post about Office 2007 - we've also published Implementer Notes for Office 2010 against the ISO/IEC 29500 standard. These live with the documents above on MSDN, and the exact file is MS-OI29500. I already gave a lot more information regarding how to read implementer notes in my previous post, so I won't go into detail again here.
Stuck?
If you get stuck here, Microsoft have two forums dedicated to you. The first (brand new) covers Office's implementation of IS 29500 and is something we're hoping to build up in the coming months. It's agnostic to the technology you're using to create or read the files. The second is applicable if you're using our own .NET Open XML SDK to build files - as I mentioned above I'd encourage you to use other toolsets wherever you can, and our Open XML SDK is currently the most popular approach on the Windows platform.
Summary
I realise the situation is more complicated here than it is for Office 2007, and it's much more complicated than it was for the old binary formats where all the documentation was in one place. With standards compliance comes a slightly more complex documentation situation, but the upside of using a standards-based file format maintained by ISO will, I believe, outweigh the extra effort in reading the documentation. If you remember nothing else from this article, I propose memorising the following table:
Document | When to refer to it? |
IS-29500 | Always (when looking at files Office creates, it's this PDF I have open 99% of the time) |
Office 2010 Implementer Notes | When investigating specific Office behaviour with regard to features covered in IS 29500 (quite honestly, I only look at these when something doesn't work the way I expected) |
Office 2010 Extensions | When trying to read or write a feature that is new in Office 2010 |
I've styled this post as referring to Office 2010, but in reality this approach should stand you in good stead for many versions of Office into the future, as you'll be using variants of the above three documents for a long time to come.
Comments
- Anonymous
November 11, 2010
I'm having a strange problem with Word 2010, I think it must be related to an update or something. I recently created a document in Word 2010 and saved it as a .docx document. When I close it and open it again, it creates a temporary document with a different title (Wd0000001) which is [Read Only] and [Compatibility Mode]. What is going on here? I checked the options to make sure that it is saving in the docx format. But now when I look at the document in the windows explorer it calls it a Word 2007 Document (.docx), when before it was just called a Word Document (.docx). I'm totally confused here, can you point me in the right direction? Thanks. - Anonymous
August 05, 2013
If it's not standard, it doesn't worth the read.