Jaa


Alex Brown's research, AbiWord enhancements

In a recent blog post, Alex Brown looks at how well Office 2007 supports Open XML. As he explains,

I was excited to receive from Murata Makoto a set of the RELAX NG schemas for the (post-BRM) revision of OOXML, and thought it would be interesting to validate some real-world content against them, to get a rough idea of how non-conformant the standardisation of 29500 had made MS Office 2007.

It's an interesting question. Office 2007 supported the ECMA-376 standard, but many changes were made during the evolution from ECMA-376 to IS29500. How many of those changes affect the content in a typical large document?

Strict vs. Transitional

One of the changes made at the BRM a few weeks ago was to delineate two types of conformance for Open XML documents: strict and transitional. As it said in the first paragraph of Canada's proposal at the BRM on conformance classes,

Requests have been made to implement a more formal separation of “deprecated” features and to avoid the term “deprecated”. Canada proposes meeting these requirements by introducing strict and transitional conformance classes. Strict and transitional conformance classes determine verifiably different types of documents. The primary difference is that features from the proposed Annex A, “Selected Transitional Migration Features,” are prohibited from strict documents, but are allowed in transitional documents.

In other words, strict conformance requires that a document not use those features that are present for backward compatibility and are not recommended for new documents in the future. And transitional conformance allows for the use of anything and everything defined in the spec.

So Alex decided to take the main body of Part 4 of the ECMA-376 specification, which is available for download in DOCX form on the Ecma web site, and test it for strict and transitional conformance against the IS29500 Relax NG schemas.

Test Results

The results were predictable: the document was not conformant to either class. Changes made at the BRM are not yet reflected in any existing implementations, and in this case the Ecma spec was created over a year before those changes were made. Here are the totals:

  • Validation against the strict schemas: 122,000 errors
  • Validation against the transitional schemas: 84 errors

Office 2007 was designed to be highly compatible with existing documents, so it uses features of Open XML that provide backward compatibility, including many of the elements and attributes that were moved to "transitional status" as a result of the BRM. So the test of strict conformance, although interesting, is a bit abstract: it's testing whether a document conforms to a subset of the spec that was defined after the document was created.

The second number is the more meaningful one. Those are places in the test document where something is done in a way that doesn't match the final IS29500 spec. Alex provides one specific example, to show the types of changes caught by that test: an attribute with a value of "on" that should say "true" instead, due to "one of the many tidying-up exercises performed at the BRM."

To put that second number in perspective, there were 84 total errors in a document of 60,299,969 characters, which works out to about one error in every 700,000 characters or so.

Alex's research is an interesting first step in understanding conformance for IS29500. Another interesting step may eventually appear in the form of a test suite, a suggestion from Italy and other countries. The existence of such a test would be useful as more implementations become available.

Alex's post ends with a note that he intends to "repeat the exercise with ISO/IEC 26300:2006 (ODF 1.0) and a popular implementation of OpenDocument." He also asks "Will anybody be brave enough to predict what kind of result that exercise will have?" So far, no takers. Stay tuned.

Speaking of Implementations

Google recently unveiled the winning entries in Google's Summer of Code 2008, a program that offers student developers stipends to write code for various open source projects. Two of this year's winners are enhancements to the Open XML implementation in AbiWord.

Comments

  • Anonymous
    April 22, 2008
    Well,  I suppose 122 thousand seems like a small number when counting the number of atoms in the universe, so perhaps things aren't so bad after all.

  • Anonymous
    April 22, 2008
    Now Joe Wilcox on his (anti) Microsoft Watch blog is on the story, to wit: "Office 2007 Fails the Test When is a standard not a standard?" He goes on about the "startling pronouncement" blah blah.

  • Anonymous
    April 22, 2008
    so this standard is not implemented ANYWHERE in the world as of yet? Let's pray you folks manage to pull it off!

  • Anonymous
    April 22, 2008
    Yes, Max, the day that the vote passed our products didn't automatically start supporting the changes that were made to the spec in recent weeks.  And keep in mind that the final spec isn't even available yet from ISO/IEC. I'm doubtful that there has ever been a product that has supported a standard as of the day it was ratified, but if that has happened then it would have to be a standard that was not materially improved or modified during the standardization process.  That's not the case here: the standards process improved the IS29500 text, and we're all better off for that in the long run.

  • Anonymous
    April 22, 2008
    Hi Doug, I think you've slightly misinterpreted Alex Brown's post: the "84 errors" were all of the same "on/off no longer valid" type. So, in that document, there was only one real difference from the transitional schema, because of the narrowed definition of ST_OnOff. Having said that, I don't think it was a very good document to test against: its very repetitive, doesn't exercise many features, and I'm not certain that it was generated by Word 2007 in the first place! I know there was a lot of automation used to create the spec, and I don't know whether there was a final "Edit in Office" step, or whether the final version was produced from another tool. But you're right - it's a good first step.

  • Anonymous
    April 22, 2008
    Thanks for the clarification, Inigo.  You're right, I missed the point there: Alex's transitional conformance test found 84 instances of that same ST_OnOff issue and nothing more. And your second point is a great one, too.  I think you may be right that the document wasn't even created by Office -- I'll look into that.  I agree it's not a best-case document for testing, but it's a very large document that's publicly available so from that perspective it's a practical choice.

  • Anonymous
    April 22, 2008
    So, will Office 2007/2008 SP2 (or some later SP) will save new documents in the strict mode and use transitional only for converting old documents?

  • Anonymous
    April 22, 2008
    Mat, we've not announced any details like that yet but I'll post details when we have them.  One thing to keep in mind is that most users want interop across the widest possible variety of implementations. Inigo, I've confirmed that Word was in fact used in the creation of that document, as part of the automated assembly.  So it's a valid test of Office's conformance in that sense.

  • Anonymous
    April 23, 2008
    Ian, FYI regarding Joe Wilcox's post (http://www.microsoft-watch.com/content/interoperability/office_2007_fails_the_test.html), I posted a comment yesterday but he apparently decided to not let it through.  I posted it immediately after his comment at 1:22PM yesterday (before the comment from Pinball had appeared), and I included a link to this post, and also to the open letter where Chris Capossela said "we are committed to supporting the Open XML specification that is approved by ISO/IEC in our products" (http://www.microsoft.com/interop/letters/ChrisCapOpenLetter.mspx).

  • Anonymous
    April 23, 2008
    Doug, it's possible you forgot to include your email address when you posted at Joe's blog.  My comments appear instantaneosly, so I suspect he is not moderating the blog. Why don't you try again?

  • Anonymous
    April 23, 2008
    I think it may be a case of approving your IP address once and then you go straight through after that, like it works on Wordpress blogs.  I had included my email address yesterday, and just now posted again and got the same message I did yesterday, "Your comment has been received and held for approval by the blog owner."  We'll see.

  • Anonymous
    April 29, 2008
    seems to be a good deal of chatter recently about our support for the modified version of Open XML that