Compartilhar via


Office’s Support for ISO/IEC 29500 Strict

There has been some interest expressed lately regarding how soon Microsoft Office will offer full read/write support for the Strict conformance class of ISO/IEC 29500. I can certainly understand the interest in this topic from those involved in the standards process, as well as from our customers and other implementers. That’s why we’ve been looking into the issues and options for Strict support for quite some time. Many of you have observed our movement in this direction. Indeed, a member of WG 4 blogged about our progress toward Strict recently.

We generally don’t publicly discuss features this early in the product development lifecycle, but given the broad interest I’m going to share some of our thinking on Strict support here. We’re doing this to assure everyone involved that we understand – at all levels within Office – the importance of Strict support going forward. In short, we will support Strict no later than Office “15.” I’ll outline our general plans below, and ask you to stay tuned for more details as we get further into the Office 15 wave.

Conformance: Background and Jargon

Before I cover the topic at hand, I think it’s worthwhile to take a quick look at how conformance is defined in the standard itself. There are two versions of the Open XML standard that have been approved by standards bodies:

  • ECMA-376 was approved in 2006 by Ecma International. Ecma is a consortium standards body like OASIS, with members including implementers, vendors, public and private organizations, and individuals.
  • ISO/IEC 29500 was approved in 2008 by the member bodies of JTC 1, the joint technical committee of ISO and IEC responsible for development and maintenance of information technology standards. ISO and IEC are international standards bodies whose members are mostly countries. (To be precise, ISO/IEC members are the national standards organizations of various countries, such as ANSI in the US, BSI in the UK, or AFNOR in France.)

The ECMA-376 standard was submitted to JTC1 as a DIS (Draft International Standard) in 2007. Many countries (“member bodies”) participated in the standards process, and the version that was approved as ISO/IEC 29500 in 2008 included many changes that were suggested by the member bodies and then approved at the BRM (Ballot Resolution Meeting) in February 2008. For purposes of this post, the key changes to note are those in the conformance clause, which describes how to determine conformance to the standard.

In ECMA-376, two types of conformance were described in Section 2 of Part 1 of the standard: document conformance (Section 2.4) and application conformance (Section 2.5). These were just what you’d expect from their names: document conformance was about how to determine whether a document conforms to the standard, and application conformance was about how to determine whether an application conforms.

In ISO/IEC 29500, assessing conformance is more complicated because of several changes agreed to at the BRM that made conformance more granular than in ECMA-376.

The key change, which Alex Brown covers in his blog post, was the introduction of the concept of Strict and Transitional conformance classes. Transitional is intended to preserve the fidelity of existing binary documents being migrated to ISO/IEC 29500, and includes many legacy features for compatibility with existing documents. Strict is a subset of Transitional that does not include legacy features – this makes it theoretically easier for a new implementer to support (since it has a smaller technical footprint, so to speak), but also makes it less able to preserve the fidelity of existing documents.

Another conformance-related change at the BRM was the creation of separate conformance classes for word processing, spreadsheet and presentation documents and applications within both Strict and Transitional. So, for example, an application can be a conforming WML (WordprocessingML) Transitional application, or a document can be a conforming SML (SpreadsheetML) Strict document.

Yet another expansion of ECMA-376’s relatively simple approach to conformance was the addition of application descriptions, as covered in Section 2.6 of Part 1. An application may conform to either the Base Application Description (meaning that it supports at least one feature of its conformance class) or the Full Application Description (meaning that it supports every feature within its conformance class). That’s a pretty coarse distinction, but the standard anticipates refinement of application descriptions in Section 2.6.3, which states that “It is expected that additional application descriptions will be defined within the maintenance process for ISO/IEC 29500.” Indeed, SC 34/WG 4 (the working group tasked with maintenance of the standard) has discussed this concept just two weeks ago during the meetings in Stockholm, where Mohamed Zergaoui (representing France’s AFNOR) presented some thoughts on this topic. I expect that WG 4 will work to clarify and refine the conformance language of the standard going forward, and we look forward to participating in that process.

Office’s Approach to Open XML Conformance

Office 2007 was the first version of Office that supported the Open XML formats, with support for reading and writing of documents that conform to the ECMA-376 standard. To help improve interoperability between our implementation and others, we also published comprehensive implementer notes that transparently document the details of Office 2007’s implementation of ECMA-376.

After we shipped Office 2007, we got to work on the next version of Office, which was code-named “Office 14” but is now widely known as Office 2010, the version that we’ll be releasing very soon. For each new version of Office, we start with intensive research and planning to determine what new features will appear in the next release, and that process was ongoing during the DIS 29500 process. By the time of the BRM (in early 2008), we had our plans locked down and were working hard to deliver on Office 14, and meanwhile the standards community was working to make changes to the proposed DIS 29500 standard.

After approval and publication of final ISO/IEC 29500 text in 2008, the Word, Excel, PowerPoint and Graphics teams looked at how we could change our plans for Office 14 to accommodate the ISO/IEC version of the standard. As Shawn covered in a blog post one year after publication of the standard, we made the changes necessary to support ISO/IEC 29500 Transitional in Office 2010 .

The decision to start with Transitional was a relatively simple one at that time. Our primary consideration was simple: the needs of our customers. Our customers place a very high value on compatibility and interoperability, because they often need to allow people to collaborate across multiple versions of Office (due to varying upgrade schedules among trading partners, across supply chains, or between the departments of a large organization, for example). ISO/IEC 29500 Transitional is designed for high-fidelity interoperability with the binary formats and ECMA-376, so it’s the logical choice for these sorts of scenarios.

In addition to the work we did to move from ECMA-376 to Transitional, we also started doing the work to move toward Strict support as soon as the final text of ISO/IEC 29500 was locked down. For example, we invested resources in migration from VML to DrawingML for many features, we moved ink annotations to the new content part added at the BRM, and we added support for reading Strict files.

All of that work took has moved us much closer to full Strict support, and I’d like to state clearly and unequivocally at this time that we will support reading and writing of ISO/IEC 29500 Strict no later than the next major release of Office, code-named Office “15.”

I emphasized “and writing” there because we have already built read-only support for Strict into Office 2010, and Strict read-only support will also be available for Office 2007 SP2 through a downloadable filter. We’ve taken those steps to assure interoperability between Office 2007/2010 and other implementations of the Strict conformance class, including Office 15 in the future.

There’s one technical change that has come up during the maintenance process which I feel is worth pointing out, because of its large impact on the move to Strict support. There was a defect report submitted to WG 4 by the Swiss technical committee last year that proposed changing the namespaces of ISO/IEC 29500, so that implementers could have a simple and reliable mechanism for distinguishing ECMA-376 documents from ISO/IEC 29500 documents. WG 4 started discussing and debating various ways to address that proposal over a year ago, and last summer reached consensus on changing the Strict namespaces, but not the Transitional namespaces. This resulted in a large number of changes to the text of the standard – for those interested in a good overview of the magnitude of those changes, check out Orcmid’s latest blog post.

Implementers, including Microsoft Office, will need to think carefully about how to handle the namespace changes in a way that gives customers the best possible experience. This is yet another challenge in planning support for Strict, and something the product teams are currently looking into as we start planning for Office 15.

Maintenance of IS 29500

Another topic that Alex raised in his blog post was the ongoing maintenance activity in WG 4, including progress to date, prioritization of the work, and other considerations. I’d like to briefly respond to his thoughts here, while acknowledging that WG 4 itself is the proper place for in-depth discussion and planning of the maintenance process. Any person from any SC 34 member body can participate in WG 4, so if you have thoughts on maintenance of ISO/IEC 29500, I’d encourage you to get involved.

WG 4 has existed for about 18 months now, and we have worked through a very large number of defect reports in that time. Although I’ve not participated in other JTC 1 working groups before, I’ve heard that the pace of WG 4’s work, with conference calls of up to two hours every two weeks, ongoing email on the public WG 4 reflector, and face-to-face meetings every three months, has been exceptional. Over 340 defect reports have been submitted to date, and WG 4 has processed and closed 242 of those, with 36 others in “last call” status (meaning that a defined solution is pending final approval by WG 4), and less than 70 awaiting further consideration.

Japan, the UK, and Ecma have been the largest submitters of defect reports to date, and defect reports have also been submitted by Denmark, Switzerland, Czech Republic, Ireland, and others. The maintenance process is proceeding smoothly, and we’ve handled changes ranging from simple editorial corrections to major proposals such as the namespace change mentioned above. Through it all, I feel that the WG 4 team has really gelled, and we’ve established a productive results-oriented working style that is well-suited to both the participants and the work at hand. Could we improve the process in various ways? Of course we can, and we will. But I think it’s worth noting that WG 4 has been very productive to date, with the first batch corrigenda already prepared, reviewed and approved, the first set of amendments in the pipeline, and work already underway on the next sets of corrigenda and amendments.

I’d like to keep the discussion of WG 4 procedures within WG 4 itself, since those are the people who will ultimately be doing the work. But as I said above, if you have thoughts on how to tackle maintenance, please get involved. Contact your National Standards Body for information about how to participate from your country.

Validating IS 29500 Conformance

What’s the best way to assess conformance to a large complex document format standard? This is a question that challenges the best and brightest minds in all of the standards organizations responsible for such formats, including SC 34 as well as OASIS, Ecma, and others.

As Jesper Lund Stocholm recently noted in a blog post about his new validator project, schema validation is the easy part. It can be automated, and there’s no ambiguity regarding whether a specific XML instance is valid against a specific set of schemas. The bigger challenges come when you try to validate the semantic and syntactic constraints that are embodied in the normative text of the standard.

Many people are working on how to best tackle these challenges in the world of ISO/IEC 29500, including Jesper and Alex’s validator projects, as well as the work being done by Fraunhofer and others. Here at Microsoft, we’re excited to see so much talent being applied to this area, and we’re looking forward to working with fellow WG 4 members and others to assess conformance in a way that the community agrees is best.

This post is already quite long, so I’ll not go into a lot of detail here except to note that there are two main areas where we expect to see useful results soon that will help raise validation testing to a new level of rigor and repeatability:

  • Identification of the semantic constraints contained in the text of the standard, so that all validators can work against a known-complete set of such constraints. Fraunhofer has done some interesting work in this area, to extract potential semantic constraints from the normative text, and we’ll be working with them to find a way to provide those constraints to writers of ISO/IEC 29500 validators.
  • Availability of a community-driven document test library, which implementers can use to test interoperability across conformant implementations of the standard. Fraunhofer has started this work, and there is much more to be done. Microsoft has contributed to this activity, and we’ll be staying closely involved.

Regarding the specific details of conformance, Alex noted that in addition to conformance issues caused by bugs in implementations, there can be issues caused by contradictory provisions within the text of the standard. Such contradictions can and do occur within various standards (both ISO/IEC 26300 and ISO/IEC 29500 have at least one of them, for example), and one of the goals of standards maintenance is to identify and correct such errors. The common pattern for such contradictions is that some portion of the standard will state that implementers shall do X, but there is text elsewhere in the standard (often text that was added later) which states that implementers may do Y in the same situation.

A strict reading of the text would lead one to conclude that such errors make conformance impossible. As a practical matter, however, implementers need to do something – they need to make a judgment call regarding the most reasonable interpretation of the intent of the standard in these areas. In the case of our IS 29500 implementation, we have done exactly that, and we’ve documented such interpretations within our published ISO/IEC 29500 implementer notes, so that everyone can see how we’ve interpreted the standard.

Separately, such errors need to be corrected in the standard. We are also contributing to that work. As one recent example, I wrote up a defect report myself while WG 4 was in Stockholm, to address an internal inconsistency regarding relationship types that Alex’s Office-o-tron validator had identified. We will work with the community to proactive identify more of these sorts of errors and get them corrected, and as part of my job I’m thinking through how we can best do that going forward.

One other detail that Alex mentioned was the use of the phrase “new documents” in the conformance clause for ISO/IEC 29500 Transitional. He noted that this term is not defined in the standard, and we agree that the intent of that term needs to be clarified. Here’s relevant text from the conformance clause:

“The intent […] is to enable a transitional period during which existing binary documents being migrated to DIS 29500 can make use of legacy features to preserve their fidelity, while noting that new documents should not use them. […]

One thing to note there is the word should, which is a well-defined term. RFC 2119 covers the use of key words like should/shall/must/may within the normative text of standards, and here’s how should is defined:

3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

In the case of Office 2010’s use of Transitional, we have decided to prioritize compatibility and interoperability with existing implementations, because we believe this is in the best interest of our customers. So although the conformance clause says that Transitional “should not” be used for new documents, we have decided that the needs of customers, combined with the realities of the current document format ecosystem (most existing implementations are Transitional, recent major changes to the Strict namespaces), make Transitional the right choice. We will continue to update our plans in response to feedback from customers, other implementers, and the standards community going forward.

Summary

In closing, here’s where we stand:

  • In Office 2010, we’re providing read/write for Transitional and read-only support for Strict.
  • We will include write support for Strict no later than the initial release of Office 15. (More details will be forthcoming after we complete our planning.)
  • We are committed to continuing to work closely with the community on validation techniques, and we are actively using the available ISO/IEC 29500 validators to improve the quality of our implementation.

We’ve learned a lot from the IS 29500 standards process, and we continue to learn from the open and respectful exchange of ideas within SC 34 and the broader standards community. None of us have all of the answers, and many of the challenges that we collectively face are complex, but I’m confident that we can work through them and find solutions that address the needs of customers, implementers, standards workers, and other stakeholders.

And once again, I’d like to reiterate that if you have opinions about ISO/IEC 29500 maintenance, please get involved. I’m humbled by the level of expertise that WG 4 members bring to the table, and also by the commitment of those who volunteer large amounts of their own time to work toward improving the standard. I know I speak for every member of WG 4 in saying that we’d love to have even more participants involved.

Comments

  • Anonymous
    April 06, 2010
    It sounds a bit weak. I understand the reasoning, but that doesn't preclude offering RW support for Strict in both 2007 and 2010 through a service pack. ODF support was integrated into Office 2007 in Service Pack 2. Why not consider doing something like that? It is a pretty big deal to skimp out on a promise that the company made in order to "fast-track" the standardization of the Office Open XML formats. I personally save in both ODF and OOXML out of necessity, but I feel unassured of support of OOXML Strict. I feel this is more of a cop-out than anything else. What stops you guys from saying that Strict support will be delayed yet again? I want a solid assurance of Strict support. I'd prefer it to become available to Office 2007 and 2010 as a service pack, but at least it should be made fully usable in 2010 as a service pack.

  • Anonymous
    April 06, 2010
    The comment has been removed

  • Anonymous
    April 06, 2010
    A minor problem in your article. You quote the definition of "should" from RFC 2119. That RFC is inapplicable to IS 29500. As is required for all ISO/IEC standards, the requirement keyword definitions for IS 29500 are found in Annex H of ISO/IEC, ISO/IEC Directives Part 2: Rules for the Structure and Drafting of International Standards (5th Edition, 21 December 2004), <http://www.iec.ch/tiss/iec/Directives-Part2-Ed5.pdf>. "Should" has essentially the same meaning under both sets of definitions. However, the definitions for "may" differ very substantially. There are other purely syntactic differences between the two sets of definitions. But implementers -- including Microsoft's own -- could be confused by the differing definitions of "may" if you refer them to RFC 2119. In the ISO/IEC Directives, "may" has its normal meaning of permission. But under RFC 2119, the terms "may" and "optional" are bounded by two mandatory interoperability requirements. <http://www.ietf.org/rfc/rfc2119.txt>. This is an area where I think the various standards bodies need to standardize their vocabulary. As an example, nearly all non-JTC 1 XML standards use the RFC 2119 definitions and ODF was drafted using those definitions, but the keyword definitions incorporated by reference in Section 1.2 were flipped to the ISO/IEC definitions at JTC 1. One result was that IS 26300 has 7,192 less mandatory interoperability requirements than did OASIS ODF 1.0. See my article at <http://www.universal-interop-council.org/node/41>. The second result was that OpenOffice.org's destruction of foreign elements and attributes created by other ODF implementations arguably became conformant. The damage has never been repaired. So the case is clear that the distinction between RFC 2119 and ISO/IEC requirement keyword definitions is important. This is an issue you might bear in mind in future communications. I'm glad to hear that Microsoft is implementing conformance with IS 29500 Strict. But will that implementation be accompanied by a compatibility mode in the Office apps ensuring that data will not be lost when saving a new document to Strict? I.e., features available only in Transitional disabled in the apps? I'd also like to see that implemented for the ODF support.

  • Anonymous
    April 06, 2010
    One other point that Alex was complaining about was that the support from Ecma and Microsoft in the maintenance of the format has dropped. Could you inform us about MS commitment on working with the ISO/IEC JTC1 SC24 workgroup on the maintenance of ISO/IEC 29500

  • Anonymous
    April 06, 2010
    The comment has been removed

  • Anonymous
    April 07, 2010
    Hi Doug, Thanks for the clarification. You mention application descriptions in your post. Does this mean, that Microsoft Office 2010 will conform to the conformance classes for strict consumers (Part 1, section 2.5) via the full application description (Part 1, section 2.6.1) ? /Jesper IBM drone

  • Anonymous
    April 07, 2010
    Thank you Doug, that is very enlightening and I now understand the way MS worked on Office 2010 much better. @Sir Gallantmon: I would guess that plans for SP features won't be 'set in stone' before RTM, because SPs are first and foremost a bug-fix element. Now, I've seen nowhere in Doug's long-winded yet still incomplete (it's a vast topic) post that Strict support won't be implemented in a later SP. For one thing, it may be considered that 70 points requiring consideration are still a lot - maybe, by the time SP1 for Office 2010 is in planning, that number will have shrunk - and will thus make development a less hazardous matter.

  • Anonymous
    April 07, 2010
    Regarding the specific details of timing or deployment of Strict support, as I mentioned above we've not even released Office 2010 yet.  We’re finishing up that version and then will begin planning for Office “15,” so we won’t know the details for a while.  I'll share more info when we have it. @marbux, thanks for the info on the relationship between RFC 2119 and the ISO/IEC Directives.  As you noted, it doesn't affect the particular issue I covered above, but it's an important distinction  and I'll be sure to note it when appropriate going forward. @hAl, we're committed to working closely with SC34 through Ecma TC45 and the various member bodies we participate in.  We've committed significant resources to WG4's work to date, and will continue to do so. Jesper, that's an interesting question.  I'll look into the details and follow up here soon.

  • Anonymous
    April 07, 2010
    What about Office 2011 for Mac OS X - will it support strict OOXML?

  • Anonymous
    April 07, 2010
    Mac Office 2011’s support for ISO/IEC 29500 is essentially the same as described above for Office 2010 for Windows.  For more info, see http://www.officeformac.com/blog/Working-with-Office-for-Mac-2011

  • Anonymous
    April 07, 2010
    #Dave, I think the main argument for needing OXML has always been its ability  to support full set of features that Office customers expect,  which ODF does not do.  But since this post is about the question of IS29500 Strict vs Transitional,  and I’d like to keep us on topic here,  I’ll refer you back to my many earlier blog posts about ODF for my thoughts on that question.

  • Anonymous
    April 07, 2010
    @Jesper, Office 2010 will conform to the conformance classes for Strict consumers as described in Part 1, Section 2.5: Word as a WML Strict consumer, Excel as a SML Strict consumer, and PowerPoint as a PML Strict consumer.  Regarding application descriptions, each of these will be a Base application as covered in Section 2.6.1.  (I noticed you said Full and section 2.6.1 in your question, but Full is covered in 2.6.2 and Base in 2.6.1.)  There are a small number of optional features that will not be supported, when Office 2010 reads a Strict file, such as ink annotations in WordprocessingML comments or header/footer pictures in SpreadsheetML documents, and Office will ignore those specific constructs if they exist in a document.

  • Anonymous
    April 07, 2010
    Obviously at Microsoft, churning out software releases to reflect quarterly profit is much more important than meeting certain baseline featuire sets and product quality.

  • Anonymous
    April 07, 2010
    Hi Doug, "There are a small number of optional features that will not be supported, when Office 2010 reads a Strict file, such as ink annotations in WordprocessingML comments or header/footer pictures in SpreadsheetML documents, and Office will ignore those specific constructs if they exist in a document." Hmm - I'm sorry to hear that. Will you provide us with a complete list of those "small number of optional features" in Part 1 that Microsoft Office 2010 will not support?

  • Anonymous
    April 08, 2010
    The comment has been removed

  • Anonymous
    April 08, 2010
    @Bugeyes, the topic of ODF formulas has been covered in some detail last year here: http://blogs.msdn.com/dmahugh/archive/2009/05/05/odf-spreadsheet-interoperability.aspx http://blogs.msdn.com/dmahugh/archive/2009/05/09/1-2-1.aspx I don't have anything to add to those posts, and nothing has changed since then regarding published ODF versions. Going forward, we've committed to announcing an Open Formula roadmap when it becomes an approved standard, and Eric Patterson (our spreadsheet formula expert) is very involved in the ongoing work to finish up Open Formula so that it can be approved and published.

  • Anonymous
    April 08, 2010
    @Jesper -- yes, we’ll be publishing documentation of those details. I’ll share the details when I have them.

  • Anonymous
    April 08, 2010
    The comment has been removed

  • Anonymous
    April 09, 2010
    Hi Doug, Might you please address the points Alex Brown raises at the blogpost by Orcmid you referenced? (It's the first reply.)  I'm interested in your thoughts on those issues.  Thank you. -Daniel

  • Anonymous
    April 14, 2010
    Hi Doug,  Microsoft was much more in a hurry to assemble and fast track through ISO/IEC a 6000+ page text (around 1 year between the creation of ECMA TC45 and submission to ISO, IIRC) than it is to implement the corrections and changes agreed during the BRM (based on your comments, it seems that this will take at least 5 years)... However, thank you for recognising that Microsoft won't let the wording of the ISO standard stop you from doing what you want to do: "although the conformance clause says that Transitional “should not” be used for new documents, we have decided that the needs of customers, combined with the realities of the current document format ecosystem [...] make Transitional the right choice".