NOOOXML says: “Down with XML”

I was recently pointed to a presentation about Open XML that raised my curiosity. It found its way to me because it included my picture, but the content is what's on my mind. I take the Open XML discussion pretty seriously; I've had very interesting and stimulating discussions about Open XML with a lot of folks, but I've also seen a lot of the nonsense that makes the discussion cloudy and difficult (see below).

 

The slide references a comment I made in a ZDNet Australia interview in reference to advantages of XML-based document formats over binary formats for enabling better security. Specifically, having file formats represented in XML makes parsing simpler, because XML documents are expressed using a pre-defined (in this case public) schema. They can be easier to parse than binary formats, which can be opaque and obscure, even when you already have its documentation. Given a choice, I'm sure that 99/100 developers would prefer to work with an XML-based format over a binary format, if only for the sake of simplicity, and my comment here illustrates one of those reasons.

The deck goes onto state that Open XML allows "arbitrary binary blobs of data", citing this as a "security hole" (this isn't really anything new; this has been rehashed on several forums). I'll just take a guess and say the presenter probably missed a few important references about ODF (search for "Binary" in the text), or within the ODF spec itself… Section 9.3 of the ODF specification discusses how frames can contain "Objects represented either in the OpenDocument Format or in a Object Specific Binary Format . " Section 9.3.5, describes the ability to add "plug-ins" to documents for "a media type that is not usually handled natively by office application software." Base64Binary is a core data type of ODF, as described in section 16.1.

Of course both Open XML and ODF allow the embedding of binary content. So I guess it's not clear to me why we're picking on the binary DevMode structure when (so-called "arbitrary") Binary data is supported in both formats (and probably every other authoring file format that is in widespread use today). If the implication is that ODF doesn't allow the inclusion of "arbitrary" binary information the implication is absurd and false. By this logic I'd guess it's worth a question to OASIS if we should expect binary data to be removed from a future version of the ODF spec? – I know the answer to that question; it's not even worth asking.

I haven't heard the deck presented, nor do I plan to tear the rest of it down (might be fun for a rainy day), but it looks to me like whoever created this slide deck is attempting to criticize a fundamental purpose of XML. Or maybe this is a criticism of the entire list of XML-based format specifications. Nothing about this criticism is specific to Open XML… it is an indictment of XML and document formats.

It seems odd to pick a fight with yourself (… very Fight Club-ish… "I am Jack's Self-deprecating Argument"…);

The discussion about parsing XML formats vs. binary formats is equally applicable to Open XML, ODF, UOF, CDF, or (pick your XML-based format of the day). These slides contribute nothing to the XML formats discussion other than confusion. Part of the reason that the XML Formats debate exists is because (I think) we at least agree that XML offers us better opportunities for document format management than a binary format would… but according to the their point of view, I seem to be mistaken on that point. I must also be seeing things, because when I read the ODF spec, I see a lot of "arbitrary" binary data types in there too… obviously I've missed something.

Silly me J.

Comments

  • Anonymous
    January 01, 2003
    I was pointed at a document created by the ODF alliance (with a creation date of Feb 6 th ) that discusses

  • Anonymous
    January 01, 2003
    Hi Andre, I'd be surprised if anything positive was said... given the rest of the deck, (is that really a dead bear?) it doesn't seem like it was intended to improve the perception of Open XML. But I think we're agreed on many points. Parsing of document formats makes a huge difference in their security. http://support.microsoft.com/kb/935865 is an example of some of the work we've done in that area. Parsing is critically important to both binary and XML implementations. Also, I'd agree that binary formats are going to be around for a while, given how many there are, and their widespread use today. On the relative maturity of XML vs binary implementations, remember Office has been parsing both since Office 2000 (with XML taking an increasingly prominent role). So even the XML implementations in Office are fairly mature. But I think we're saying similar things here. ALL XML formats reference binary data, and (at least it seems to me) that the parsing / implementation matters a lot.

  • Anonymous
    January 01, 2003
    Thanks Gareth... that comment makes a lot of sense, and I couldn't agree more. (and isn't it the whole point? :)

  • Anonymous
    January 01, 2003
    Hi Gareth, From my standpoint, what's interesting is the opportunity Open XML opponents have missed... Open file formats that support Office functionality are an opportunity for others to more fully integrate file formats, much like the countless implementations of Microsoft Office binary formats. It's worth taking these formats and committing them to standardization for a lot of reasons. Regardless of market, potential, etc., it's just smart to take Open XML as a standard, if only to commit the current spec to the public record.

  • Anonymous
    January 18, 2008
    The comment has been removed

  • Anonymous
    January 18, 2008
    The comment has been removed

  • Anonymous
    January 25, 2008
    Gray, See my post from back in June- http://blogs.msdn.com/brian_jones/archive/2007/06/13/open-xml-in-science-and-nature-deploying-office-2007-and-more.aspx#3445179 Gareth

  • Anonymous
    February 20, 2008
    @andre [quote]This should be stored in XML, not in some undefined application-dependent format.[/quote] If the devmode data is loaded directly into a driver the format will be defined by that driver. Storing the data in XML would mean putting tags around driver defined data formats and would only hide the fact that the dataformat is still an application/driver defined binary format and still allows data manipulation and has the same security risk as having the data being defined as binary data. If the data were truly driver indepent XML data than it would require either arbitrary conversion to each and every kind of printerdriver format or for each and every printerdriver to be able to accept standardized XML settings.