Guiding principles for Office’s ODF implementation

This blog post covers the main presentation from our ODF workshop that took place in Redmond last week: Peter Amstein’s explanation of the guiding principles behind our support of ODF in Office 2007 SP2. I’ve added explanations of some of the details that were covered verbally in the workshop, but if anything’s not clear here, please let me know.

Why ODF 1.1?

We’re implementing ODF 1.1 in our initial release of ODF support. We chose this version because it is the most current approved ODF specification, and because it is the version of ODF that current release versions of most other applications such as OpenOffice also support. We will support ODF in Word, Excel and PowerPoint, using the file extensions .odt, .ods, and .odp. The exact release date for Office 2007 SP2 has not been announced yet, but we expect ODF support to be available sometime in the first half of 2009.

Guiding Principles

As we set out to build in support for ODF, we developed a set of principles to guide our implementation team. Those principles are:

  • Adhere to the ODF 1.1 Standard
  • Be Predictable
  • Preserve User Intent
  • Preserve Editability
  • Preserve Visual Fidelity

Let’s take a look at each of these principles in more detail and with some examples.

Adhere to the ODF 1.1 Standard

Where the specification is clear and mapping between OOXML features and ODF features is straightforward, this is of course no problem. For example, OOXML’s italics property maps neatly to ODF’s italics property.

When we found the specification to be ambiguous, we decided to follow common practice as long as it adheres to the standard. We did not create extensions in the case of features supported by Office and OOXML that are not in ODF at all. For example, ODF doesn’t support the concept of multi-stop gradient fill for shapes, but Office supports this concept. So we chose not to write multi-stop gradient values when saving to ODF.

Extending the ODF spec might have been a pragmatic approach to addressing gaps in the spec in the short term. But we felt that it would not be good for the ODF ecosystem in the long term since other applications wouldn’t be able to read those extensions (unless those products also implemented the same extensions we do) – and we don’t see that approach as promoting interoperability or the best experience for ODF users. We also don’t want to be accused of “co-opting” ODF and “polluting” the cyberspace with many ODF files that don’t adhere to the standard. We think it is better to evolve ODF with the community in the OASIS Technical Committee and/or the appropriate SC34 Working Group.

On the flip side, Office does not have support for Gantt charts, but ODF does allow them. When we load an ODF file that contains a Gantt chart we leave the chart area blank rather than try to map it to some other type of chart. But we preserve the chart data so that the user can pick another chart type from the Excel UI if desired.

Be Predictable

The principle here is that we want to do what an informed user would likely expect.

Where ODF is a superset of OOXML, we can either ignore the ODF-only constructs, or map them to an OOXML construct where there is a logical way to do so.

When OOXML is a superset of ODF, we usually map the OOXML-only constructs to a default ODF value. For example, ODF does not support OOXML’s doubleWave border style, so when we save as ODF we map that style to the default border style.

Preserve the user’s intent

In simple cases, it isn’t a problem for Word to preserve document structure and semantics when saving an ODF file. For example, a document heading can be saved with a heading style that has an associated outline level.

In more complex cases we preferred a neutral approach when saving to ODF rather than implying semantics that the user did not intend. For example, in Word one can color code the bullets in a bulleted list by applying a color attribute to the paragraph character for the list item. Word can persist that attribute when saving to OOXML, but ODF does not have the concept of paragraph characters with attributes.

If we were to apply the color attribute to the paragraph style that would cause the entire list item to take on the color, and this might imply more than the user meant. So we choose to drop the bullet color, rather than color the whole list item.

Preserve Editability

We want to preserve the user’s ability to edit the contents of their document even if they have used a feature that can’t be saved to ODF, so that what the user sees in the document and how the user interacts with the document will not be changed until the user saves and closes the file.

For example if you insert a table in a PowerPoint slide and save as ODF, you still have a table in your open presentation with all of the normal table editing behaviors – you can easily add a row or insert a column, for example. The table becomes a group of shapes only after the user closes and reopens the file. Or as another example, you can open an ODS spreadsheet with Excel and use the conditional formatting features to analyze trends in the data. But the conditional formatting will not be preserved when you save and close the file.

Preserve Visual Fidelity

Wherever possible we write the ODF in such a way as to preserve visual fidelity when the document is opened in another application.

Chart gap width (e.g., the space between bars in a bar chart) is a good example. If the gap width of a chart is not specified in the file, OpenOffice applies different defaults than Microsoft Office and will render Chart gap widths differently. So in this case, Office will write our Chart gap width even when the gap width is the default value—i.e. when we traditionally wouldn’t write it.

High Level Architectural View

Word, Excel and PowerPoint have a Model-View-Controller design. The in-memory representation of the document, or Model, is designed to facilitate document revision and display functions and includes concepts which are never saved to the file, such as the insertion point and the selection.

The persistence code converts this in-memory representation to and from some sort of the disk file based representation. Office 2007 already had code to support a number of angle-brackety persistence formats including HTML and OOXML. When we built in support for ODF, we added it in that area of our code.

That’s a general overview of how we’ve approached ODF support in Office 2007 SP2. These topics were also the foundation of the roundtable discussions we had at the workshop; for a variety of perspectives on those discussions, see the blog posts by Dennis Hamilton, John Head, and Jesper Lund Stocholm.

Comments

  • Anonymous
    August 05, 2008
    This is very useful - sounds like a good approach in general. I have a couple of questions. In OOXML outline level is an attribute that can apply to multiple styles (ie "Heading 2" and "Heading 2 - numbered" could both have outline level 2) whereas in ODF there is one document outline with one style at each level. What are you planning in that area? I'm also very interested in what you're planning to with lists and particularly list styles. OOXML have different list models when you load a Word doc into OpenOffice.org Writer it creates styles even when they existed in Word - this could be improved when saving to ODF from Word but the models are quite different so how do you propose to get reasonable interop?

  • Anonymous
    August 05, 2008
    Doug ne povesteste despre motivație, arhitectură și principiile implementării ODF din viitorul Office

  • Anonymous
    August 06, 2008
    @Peter - You might have better luck with your questions on the Interoperability Forums (http://forums.community.microsoft.com/en-US/tag/interoperability/forums/) in the Interoperability through Standards section.  I'm not sure that the Microsoft Office team has their own place for this just yet.

  • Anonymous
    August 06, 2008
    The comment has been removed

  • Anonymous
    August 06, 2008
    Peter, these are great questions, and good examples of the complexities involved.  On the outline styles, we don't extend the spec (per the principles above), so I think we probably map multi-style outline levels into a single level, but I need to confirm that.  I'll follow up with the implementation team and get some more specific info on this topic and the lists question. Gareth, the socializing was fun, and would have been even more so if you were there!

  • Anonymous
    August 07, 2008
    I've got some question about a specific feature mapping between the formats: Many companies use a different paper tray for the first page of a document to print this on special paper with a company logo. Can this page setup feature be correctly mapped to ODF and back to OOXML in all cases (i.e. including mailings, documents with multiple sections, etc.)? If not, what are the restrictions?

  • Anonymous
    August 07, 2008
    The comment has been removed

  • Anonymous
    August 07, 2008
    Microsoft a récemment annoncé son ambition d’apporter un support natif du format OpenDocument directement

  • Anonymous
    August 07, 2008
    Good principles. Doug, you are great at writing about such things. We faced all of this while recently adding OpenDocument as a new format to our Aspose.Words and we essentially made all the same choices. But we have not written about it so well... Where do I find and hire people who can do this for me? :)

  • Anonymous
    August 08, 2008
    Stefan, I'll have to look into that one in more detail, both our implementation and also what it says in the ODF spec.  We use the paperSrc element in WordprocessingML to indicate paper source, but I don't know whether there's a corresponding element in ODF to be used for this purpose.  If there is (anybody know for sure?) and the value spaces can be mapped in some way, then round-tripping as you describe should be possible. Ian, that's a good catch on the diagram.  You sure pay attention to the details.  (That's a good thing!) The diagram is incorrect for the current version of Office, because as you mentioned the translator API goes through RTF.  But it's correct for SP2 -- beginning in  SP2 there will be a new translator API that also supports Open XML as the intermediate file format (for all 3 applications Word, Excel and PowerPoint). Roman, I worked closely with Peter Amstein on this material, so you're probably reacting to his words and not mine. :-)  But it's good to hear that another implementer has made the same choices on these sorts of issues.

  • Anonymous
    August 08, 2008
    Stefan, it looks like the core team is still working on that feature (paper source mapping), so we don’t know yet if there are going to be any restrictions on how well it works when saving to ODF. The way ODF does this is that each page has a page style, and each page style can have a paper-tray-name associated with it.  So there could be a different paper tray/source for each page of a multipage document. Because Word does not support more than 2 paper trays per document (one for the first page and one for all  the others) if an ODF file has more than 2 paper trays specified we will have to use the “Be Predictable” principle to map those to the two we support.  

  • Anonymous
    August 10, 2008
    Doug, thank you for the insight into ODF. Seems like a bit of a challenge to design the paper source mapping predictable while also suitable for a good automatic conversion in both directions. What might even add to complexity, is that Word actually supports applying different first and other page trays in different sections of the document. This is also used by Word in mailings: If you output a mailing letter with page 1 in first tray and other pages in second tray into a new document containing all the letters this is what Word will do: Each letter will be in its own section having its own paper tray settings! If the single letter already was multi-sectioned the scenario even gets more complex. But I hope at least in the single-section letter scenario paper sources will be able to convert in both directions without need for manual adjustments by the user.

  • Anonymous
    August 11, 2008
    Please can I clarify something... Suppose I create an ODF document using some other vendor's product. Without me realizing, this supports features which MS Office doesn't, including Gantt charts, and I decide to use them. Let's say I call my document "Product Plan (Pre-Release)". When this is complete, my boss wants to email it to the whole company, but first she needs to edit it, to remove the word "(Pre-Release)" from the title, and she uses MS Office to make this tiny change. Has she just trashed my Gantt charts? Also what happens "the other way around", if I create my document using MS Office and (without realizing) MS-Only features, then she edits it using Open Office? If this is a real problem, please can MS Office have an option to warn (or stop) users whenever they do something that might cause this type of problem.

  • Anonymous
    August 11, 2008
    Just a quick question, but is there someone liaising with the OASIS technical committee to have non-depreciated features added to the next version of ODF?  Most of these features probably wouldn't be too hard to add to this specification, and then the ODF documents produced by Office in the future would be of higher quality.  From memory, are there not a few people from Microsoft already on this committee?  I'm sure most features, well if they don't have a negative impact on the spec, would be accepted. Cheers.

  • Anonymous
    August 11, 2008
    The comment has been removed

  • Anonymous
    August 11, 2008
    Wow this looks great guys!  I'm beginning to look forward to your continued involvement with ODF.  

  • Anonymous
    August 11, 2008
    Could you actually provide a list of semantic features ODF lacks? The DIN paper is somewhat ad hoc and didn't go much into details. There were some discussions on styles and now we observe certain Microsoft bloggers who focus on praising styles as this seems to be an issue that breaks. ;-) I would welcome a contribution to the work on ODF 1.3 in order to resolve the problems and create a generic file format for all office applications. The reason in short is that I fully read the Open XML spec and I would regard it as a bad choice to build the future on the architecture of the format. Microsoft would eventually overcome the not-invented-here syndrom over time and finally cut infrastructure related path dependencies and break free from the past.

  • Anonymous
    August 12, 2008
    Hi Doug, Thank you for this overview not only of the decision process but how Word (and the other Office apps) handle some of the conversion work. I don't recall seeing information on whether or not, as part of the Office 2007 Service Pack 2 (SP2) rollout if the Office Compatibility kit (for those in Office 2000, Office XP/2002 and Office 2003) will have the same capability, or if that's going to require a separate rollout of the ODF Translator open source converter set.   Is there any information available on that?

  • Anonymous
    August 12, 2008
    This story has been submitted to fsdaily.com! If you think this story should be read by the free software community, come vote it up and discuss it here: http://www.fsdaily.com/EndUser/Guiding_principles_for_Office_s_ODF_implementation

  • Anonymous
    August 13, 2008
    Interesting. Thanks for outlining the principles adhered to and the philosophical discussion around them. The discussion of principles alone is very helpful in considering other similar projects, and I'm pleased to see Microsoft going this direction!

  • Anonymous
    August 14, 2008
    The comment has been removed

  • Anonymous
    November 12, 2008
    I’ve had a few people ask recently whether Microsoft would be participating in the OIC TC, and I’m glad

  • Anonymous
    December 04, 2008
    I participated in the DII workshop in Brussels earlier this week, where I got a chance to catch up with

  • Anonymous
    December 16, 2008
    Microsoft has today published our first set of document-format implementation notes, for the ODF implementation

  • Anonymous
    December 16, 2008
    have just been published on the Document Interop Initiative (DII) site :- Welcome to the Microsoft Office

  • Anonymous
    December 16, 2008
    Today, Microsoft published our first set of document-format implementation notes for the ODF implementation

  • Anonymous
    February 10, 2009
    You may have seen that the implementer notes for Office 2007 SP2 for ECMA-376 and ODF1.1 are now available....

  • Anonymous
    April 16, 2009
    I am very excited about the Document Interoperability Initiative (DII) event that Doug recently announced

  • Anonymous
    April 28, 2009
    For those of us on the Office Interoperability team, as well as our colleagues throughout Office, today

  • Anonymous
    May 05, 2009
    Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability . 

  • Anonymous
    May 09, 2009
    Does 1 plus 2 equal 3?   After last week’s sometimes acrimonious discussion about formulas

  • Anonymous
    May 13, 2009
    When I blogged about the release of SP2 with ODF support two weeks ago, I mentioned that I was planning

  • Anonymous
    June 05, 2009
    There has been quite a bit of discussion lately in the blogosphere about various approaches to document

  • Anonymous
    June 14, 2009
    In this blog post, I’m going to cover some of the details of how we approached the challenges of testing