Open XML Resources for Developers

Like many people, I thought we'd know the official outcome of the DIS 29500 process today, but it looks like we won't hear the official results until after ISO has had a chance to run them by the national bodies who participated in the review of the specification, which according to Reuters will be Wednesday.

While we wait, I've been thinking about how much attention this process has been getting, especially in recent months. Back when Ecma submitted the ECMA-376 standard to ISO at the beginning of 2007 (451 days ago, if my math is right), a relatively small number of people were following the discussion around document format standards. That group has expanded significantly, and there are now many people following the story of Open XML and DIS 29500.

Since some of those people may be developers who didn't see all of the Open XML content that has been made available in the past, I decided to pull together a list of links to various resources for Open XML developers. The list is included below. I'm sure I've left out a few good resources, so please let me know in the comments if you know of a useful Open XML developer resource that I've not included here.

The Basics

A good place to start if you're brand-new to Open XML is the collection of Open XML videos on YouTube. You can see various implementations in action on various platforms (Windows, Linux, iPhone, Treo, etc.), as well as interviews with Open XML developers and other information.

When you're ready to dig into the details, start with Frank Rice's introductory paper available on MSDN, which covers Open XML document architecture and also describes many common scenarios for Open XML development projects. Another great article on MSDN is Erika Ehrli's overview of the DOCX format, which goes into more detail on the most popular of the three document types.

If you're familiar with document formats in general and want to read about how Open XML compares to other formats, or are curious about how Ecma TC45 sees the Open XML formats, be sure to read Tom Ngo's whitepaper. Tom is the CTO of NextPage and a TC45 member, and he was a contributor to the conformance clause of the original Ecma spec and also participated in work on multi-part structure and conformance at the BRM last month.

The first book on Open XML development was Wouter Van Vugt's "Open XML Explained," which is available as a free download on MSDN. Toshiba's Yoko Girier has a Japanese book out for Open XML developers as well. Another reference material for developers is the Open XML Developer Map poster, which provides an overview of the schemas and document types.

For a non-technical high-level overview of Open XML's role in the industry, Oliver Bell has written a paper entitled "Open for Business" that covers that perspective well.

Advanced Content

For more detailed coverage of the schemas, take a look at the videos of the Open XML Developer Workshop. These are videos from a 2-day class on Open XML, and you can also download the content of the workshop, including presentations, sample documents, and hands-on labs with code samples.

If you've seen me do an Open XML workshop, you know that there are a couple of concepts I really like to stress. One is the the value of custom schema support for developers who want to create innovative solutions that merge the world of documents and the world of data. Custom schema support opens up a new world of possibilities for document-based business processes, and Open XML allows custom schemas to be used to tag document content, or for discrete "custom XML parts" within a document.

Another favorite topic of mine is how to work with OPC (the Open Packaging Conventions that form the structural basis of the Open XML formats). To ensure reliable interoperability, developers need to write code that properly navigates documents by their relationship structure rather than their physical structure, and this is an easy detail to overlook when you're getting started with Open XML development.

I've blogged about these two topics in the past, so here are links to those posts for more information:

Open XML Portals

The following sites offer a rich set of Open XML content for developers, implementers, policy makers, and others:

  • OpenXmlDeveloper.org provides "how-to" information for developers working on many different platforms, and it has a Forums section where you can post questions about Open XML development topics.
  • The Office Open XML Formats Resource Center has links to many comprehensive how-to articles on Open XML development in the .NET environment, as well as whitepapers and other supporting information.
  • OpenXmlCommunity.org has information about Open XML implementations, case studies, IP information, and other non-technical content that may be useful or interesting to Open XML developers.

Developer Tools

Many of the articles I've linked to above and below cover developer tools, but here's a concise list of download links for the most popular tools for various environments:

  • Packaging API. If you're running Vista you already have the System.OI.Packaging API. If you're running XP, you'll need to install the .NET Framework 3.0 to get it.
  • Open XML SDK. The SDK for Open XML formats is a higher-level API for working with Open XML in a .NET environment. All the latest information about the SDK can be found in my recent blog post about the SDK roadmap.
  • Java developers should take a look at the open-source OpenXML4J API .
  • Another great alternative for Java developers is docx4j , an open-source library that creates an in-memory representation of the contents of a DOCX. Jason Harrop and others are building a variety of open-source tools for Open XML developers — see the dev.Plutext.org site for all the details.
  • For PHP developers, check out the PHPExcel API , which provides functionality for easily creating Open XML spreadsheets from PHP applications.

And here are two other tools that many Open XML developers find useful:

  • The Package Explorer is a handy tool for viewing, editing, and validating the contents of Open XML documents.
  • Altova's XMLSpy supports Open XML, and Altova CEO Alexander Falk's blog is a good place to learn more about it.

Developer-Oriented Blogs

There are many blogs about Open XML now, including several that provide useful developer content on a regular basis. Here are a few of my favorite Open XML development blogs:

  • Brian Jones covers a variety of Open XML topics, and is the best source of information on the thinking behind Open XML and how Microsoft sees the future of XML-based documents. Brian is a member of Ecma TC45 as well.
  • Wouter Van Vugt is an experienced .NET consultant/trainer who has led numerous Open XML workshops, created Package Explorer, and wrote the "Open XML Explained" book mentioned above. He often posts code samples as well, and is a member of the technical committee that evaluated Open XML for the Netherlands.
  • Jesper Lund Stocholm covers Open XML and ODF development, and is a very active member of the Danish technical committee that evaluated the Open XML spec.
  • Erika Ehrli is the driving force behind most of the Open XML content on MSDN, and she's also a regular blogger who posts code samples and links to other resources for developers.
  • Julien Chable, the creator of the OPENXML4J API, has a French-language blog with regular posts on Open XML for Java developers as well as .NET developers.
  • Eric White blogs about Open XML, LINQ to XML, and related topics. He focuses on the future of XML development for .NET developers, and posts code samples showing how the latest functional programming concepts can be applied to Open XML.
  • Maarten Balliauw is the creator of the PHPExcel API, and blogs on a variety of development topis including Open XML. He covers ASP.NET development as well as PHP development.
  • James Newton-King is another blogger who covers a wide range of development topics. His posts on Open XML and LINQ to XML are excellent.
  • Rick Jelliffe blogs about markup languages and related topics, and has been a major contributor to the debate around Open XML.
  • Dennis Hamilton has a lifetime of experience in technology standards, and often writes posts that look beyond the present into the world of possibilities that XML-based document formats present for developers.
  • Mauricio Ordonez blogs about Office development topics, including Open XML, and I'm looking forward to some interesting content he'll have soon for Open XML developers.
  • Finally, this one isn't really a blog, but the Open XML SDK forum on MSDN is another great place to read developers' conversations about Open XML development or get your questions answered.

Comments

  • Anonymous
    March 31, 2008
    Doug has a great post today that helps get us back to what really matters in this whole file format discussion

  • Anonymous
    March 31, 2008
    This is really great.  I was going to start rounding up this sort of thing and 'lo, here it is! Nice job.

  • Anonymous
    March 31, 2008
    Wow, thanks for these links to such truly helpful tools! Looking around, I see a crucial tool that is missing, however. For developers, could you please provide a link to a resource providing the full mapping from the legacy formats? You see, I want to implement this format fully and provide those who use my software with the specific added value for which OOXML was created. So I need that mapping. Where is it?

  • Anonymous
    April 01, 2008
    Kevin, check http://b2xtranslator.sourceforge.net.

  • Anonymous
    April 01, 2008
    Point n’est besoin de s’attarder sur le résultat du vote ISO, l’actualité est déja ou sera largement

  • Anonymous
    April 01, 2008
    Anon wrote, "Kevin, check http://b2xtranslator.sourceforge.net." Anon, that's for working with the binaries and for doing reverse engineering on the spec. That's not what a developer should have to work with! Also note at the link you provided: "the binary formats have also been made available under the Open Specification Promise" Yes, the binary formats have been made available under those terms, but no full, official ECMA or ISO documentation and mapping for them has been made available. Not even an unofficial Microsoft version. Dough, I'd appreciate it if you could make some headway towards making sure that these resources are made openly available to all.

  • Anonymous
    April 01, 2008
    Von den 87 National Body Members (stimmberechtigten Ländern) unterstützen 87% die ISO/IEC Standardisierung,

  • Anonymous
    April 01, 2008
    What do you want Kevin?  It's not really reverse engineering because you have access to the source code and so can see the exact mappings.  I actually prefer this to what would otherwise be a long and dry piece of documentation.  Moreover, in this case most of the work I would need/want to do (translation) is done for me.   So I guess the question is, what exactly are you looking for?  I think there is a considerable difference between being open, providing appropriate resources, and then having to actually do all the work for people.  

  • Anonymous
    April 01, 2008
    Also Kevin, the Library of Congress is hosting the binaries, so there is no reason that someone from the open source community couldn't produce the mapping (if that's all you want) without fear using the translator.   It's interesting that in Open Source projects not hosted by Microsoft the community is expected to do some of the work.  In contrast, when they do make donations and attempt to be more open, it is expected that they should do everything.  At what point does it stop being open source and start becoming free labor? FYI, I am pro open source but also VERY pro about the community dedicating their time and efforts and not just asking for things.  

  • Anonymous
    April 01, 2008
    Te ne avevo già parlato negli scorsi mesi . Microsoft Office 2007 ha introdotto un nuovo formato di file

  • Anonymous
    April 01, 2008
    The comment has been removed

  • Anonymous
    April 01, 2008
    Eilne uudis on see, et DIS 29500 ehk Open XML -i standard, mis oli viimasel hääletusel ISO/IEC standardite

  • Anonymous
    April 02, 2008
    I've been talking more and more with ISVs and developers who are interested in using Office as a UI platform.

  • Anonymous
    April 02, 2008
    Many of you may have already heard that Office Open XML was approved as an ISO standard ! This is great

  • Anonymous
    April 03, 2008
    You've probably heard the exciting news already - both ECMA and Microsoft have announced it. For

  • Anonymous
    April 07, 2008
    Doug Mahugh, Program Manager bei Microsoft in Redmond, hat eine umfangreiche Liste an Ressourcen zu Open

  • Anonymous
    June 02, 2008
    Some of my old readers would have noticed that I've stopped blogging for quite a while now. Thing in