Jaa


Interoperability by design

Today we announced the formation of a new customer council focused on interoperability (how to make technologies work better together). I'm sure you've noticed over time that Microsoft has made a strong commitment to work towards better interoperability, and this is a big step forward in achieving that goal. I personally have focused on interoperability issues for about the past 6 years or so in working on extensible technologies like the object model and both the HTML and XML file formats. It's something I've always viewed as a key piece of our product design, and it's exciting to see more momentum building around this.

Pulling a quote from the press release:

"The council, hosted by Muglia, will meet twice a year in Redmond, Wash. The council will have direct contact with Microsoft executives and product teams so it can focus on interoperability issues that are of greatest importance to customers, including connectivity, application integration and data exchange. Council members will include chief information officers (CIOs), chief technology officers (CTOs) and architects from leading corporations and governments. Representatives from Société Générale, LexisNexis, Kohl's Department Stores, Denmark's Ministry of Finance, Spain's Generalitat de Catalunya and Centro Nacional de Inteligencia (CNI), and the states of Wisconsin and Delaware have joined as founding members."

As I said, we’ve been committed to the idea of "interoperability by design" for quite some time now, but the actual "interoperable by design" initiative was kicked off by Bill Gates last winter (Feb '05). We've heard numerous times from our customers that interoperability is a "key IT priority." When we design our products we look at how they will interact with a large selection of other products and with a wide variety of hardware. We have very large testing matrices in place to help ensure they work. This new customer council will help us in huge ways though as they will be able to identify some real life issues that we hadn't yet thought of (or prioritized high enough). As we identify new issues we can then look to solving those as well.

You see a lot of folks talk about interoperability, but often they just don't mean the same thing. From our perspective it's something we want to build directly into the products so that it just works. Another approach that companies have taken is to talk about it from the perspective of building specific "projects" where consulting is done (for a fee of course <g/>) to wire together a number of separate bits. I've also seen that often companies will talk about interoperability when it comes to areas that they aren't really competitive in, but want to be. This often leads them to push towards less functional and innovative technologies in an attempt to level the playing field. This is a far different approach from what we are talking about, and I want to make sure there isn't any confusion. There were a couple key talking points around this announcement that I really liked, and that is that we're producing "people-ready" and "value-returning" interop solutions and this new council will help us to be even more successful in doing that.

The work we're doing in Ecma is obviously a great example of the "interoperable by design" concept. We've taken a product where one of the key complaints was that the file format was not documented, and not only moved to use open technologies (ZIP and XML), but we're working with a bunch of other companies (including some competitors) to make it a fully documented international standard.

If you want to learn more about interoperability at Microsoft, you should check out the interoperability site: https://www.microsoft.com/interoperability

-Brian

Comments

  • Anonymous
    June 14, 2006
    As Sean Daly asked in a comment to your "Thoughts on Open XML in ISO" post a few days ago, and is still unanswered:

    If MS is truly interested in interoperability, and "one of the key complaints was that the file format was not documented, and not only moved to use open technologies (ZIP and XML), but we're working [...] to make it a fully documented international standard." then why don't you just publically document the current .doc format, and allow people to start to interoperate now with "literally billions of documents that exist today in those binary formats"?

    MS keeps talking, but their actions speak louder. Making everyone purchase a whole new version of MS Office just to interoperate with their own documents, when you could give them a choice of well documented formats to interoperate with and let them make their own decision as to whether the upgrade price is worth the added ease of interoperability that ZIP and XML will bring, does not make it seem like you actually care about people being able to work with their own files.

  • Anonymous
    June 14, 2006
    Adam: MSFT is releasing a reader/writer for the new XML formats that will work with Office 2000, XP, and 2003 for free. Users won't have to upgrade to Office 12.

    Now if only MSFT would support ODT conversions with Office 12, that'd be great.

  • Anonymous
    June 14, 2006
    Adam, why?! Microsoft releases a free .doc -> .docx converter that will do much better job moving old binary format to new fully documented XML-based format than any external tool will ever manage. Once you convert those billions of old format files there is no need to support .doc anymore, as old versions of Word will loads .docx.

  • Anonymous
    June 14, 2006
    Interoperability is something I have blogged about in the past. It is an issue which customers are now...

  • Anonymous
    June 14, 2006
    OK, people don't have to buy a new version of MS Office. But how good is the reverse transformation?

    What if I have legacy processes that currently work, albeit flakily (due to lack of adequate documentation), with .doc files and I don't want to have to rewrite all of them just to add a new process that can take advantage of this new "fully documented" age?

    What if I work for an company that takes other people .doc files, does interesting things with the binary files including writing to them, and then returns these modified files back to the customer? How will that help me fix bugs in my systems while still letting me give my customers .doc files back?


    But none of that answers the central question - if MS are interested in interoperability, and if one of the biggest complaints about interoperability was that the file format was not documented, and if "literally billions of documents that exist today in those binary formats", then why don't you just publically document the current file format?

  • Anonymous
    June 15, 2006
    Q - "What if I work for an company that takes other people .doc files, does interesting things with the binary files including writing to them, and then returns these modified files back to the customer? How will that help me fix bugs in my systems while still letting me give my customers .doc files back?"

    A - Then you have a big business opportunity. Jump on this new file format, explain to your clients how much more you can do if they choose to utilize the new formats, build your existing business, and develop a new niche.

    Q - "But none of that answers the central question - if MS are interested in interoperability, and if one of the biggest complaints about interoperability was that the file format was not documented, and if "literally billions of documents that exist today in those binary formats", then why don't you just publically document the current file format?"

    Thats a good question, and I don't know if they intend to do that at any point. I have a few observations, though:

    1. Documentation was released for the Word 97 version. Took me about 3 minutes to find it on Google. (hint: search for "Microsoft Word 97 Format" including the quotes) More recent versions have expanded support for new features introduced in each, but it is still fundamentally the same format.

    2. MS does give documentation to Partners, Governments, and Institutions that request it, have a good reason, and (I assume) sign an NDA. The fact that they don't release it to competitors is a historical business decision, and is certainly their perogative. This used to be standard practice for software, and it is one of the thing that is definitely changing.

    3. The truth is, 100% of competing word processors can read/write/open Word DOC files, so its certainly an attainable goal. I can imagine it might be more difficult for a smaller, less experienced dev team, but it doesn't appear to have stopped anyone so far.

    4. As far as I am aware, MS has never sued anyone for putting word doc read/write ability into a program or even into competing software. After all, Word itself can read WordPerfect documents, which was a big marketing point back when Word had a market share of about 5%. (yes, they were the underdog at one point...)

    5. From everything I've read, the DOC and XLS file formats are basically a memory dump of how the document appears in memory in the Word (or Excel) application itself. There was never any attempt to "obfuscate" or anything--it was simply the most convenient way to store the info to disk.

    6. Because of (5), the documentation on the file format probably reveals a tremendous amount about the internal working of the applications themselves--something that even you would probably agree is the intellectual property of MS, and is their right to protect. This may explain their reluctance to publically release documentation to people who haven't signed an NDA.

  • Anonymous
    June 15, 2006
    From the same article: "We've wanted to provide folks with easier ways to work with our formats for years now, mainly because it significantly increases the value of Office documents when they are fully documented."

    Also, you claim: "I personally have focused on interoperability issues for about the past 6 years or so"

    If you wanted to increase people's ability to interoperate with their documents, why didn't you do so 6 [expletive] years ago!?! Even now, I'm going to have to wait, what, another 6 months before Office 2007 and the converters come out? During which time I still can't really do an awful lot with any of my documents at all.

    If you'd really wanted to give me these extra 6 years in which I could have been working better with my existing documents, /why didn't you/? Just document the current file formats.


    "Interoperability!"
    "You keep using that word. I do not think it means what you think it means."

  • Anonymous
    June 15, 2006
    Brad: Thanks for that. Your comment hadn't shown up though when I posted my last one, and I didn't mean to make it look like I was having a go at your reply with it.

    Point 1 doesn't really help that much with any documents created in the last 5 years. Yes, they're kind of the same, but the devil is in the details for this sort of thing.

    3) Again, it's the details that count. While 100% of WPs can read and write .doc files, they can't do it with 100% accuracy. While you dismiss "small dev teams", those are exactly the kind of groups that will exist in a single comany's IT department that will be tasked with doing small, strange things with .doc files that the OOX appears to be targeted at. Don't dismiss those dev teams.

    4) How nice of them.

    5) So?

    Points 2 and 6 are, IMHO, the only points with any merit, and they basically boil down to "Well, we did want to document the formats for interoperability purposes, but we (no matter how legitimately) wanted to maintain our exclusive lock on our customers' documents more, so on the whole, we actually didn't want to document the formats for interoperability at all."

    And this is what I've been try to get across. If things have changed now, and MS are going for interoperability and documented file formats in the future, then that's great! I'm all for that.

    I just hate the falsity of them saying "oh yes, we really, really wanted to do this ages ago" when if that were actually true, they would have done.

    And they still could. There will be plenty of people who won't want to move across to the new formats "straight away", and there will be people who need to do business with them, and still want to automate working with old documents. In another years' time, some people will still need to try and work with .doc file formats, no matter how popular .docx becomes. So please, if you really care about letting people work with their documents, document the existing file formats!

  • Anonymous
    June 15, 2006
    Adam, the original binary formats were not designed with interoperability in mind. They were essentially a straight memory dump of the internal memory structures of the applications. We've provided people access to the documentation for awhile now, but it's much more difficult to implement than the new XML formats. You can e-mail officeff@microsoft.com is you are interested in requesting access to the binary documents, or as Brad mentioned you can also just find them online.

    We have seen plenty of folks leverage the documentation to build tools that work with the binary formats, but those tools are also responsible for the majority of corrupt files that we receive and are asked to fix. The formats are very complex, and it's hard to work with them properly, even with adequate documentation.

    Building true interoperability doesn't happen with the snap of ones fingers as you seem to imply. It was quite clear to us that simply providing documentation to the binary format wouldn't have gone nearly far enough, and we focused all of our efforts on designing, implementing, and documenting the new XML formats as well as building in support for custom defined schemas so that people could integrate their own XML with Office files.

    -Brian

  • Anonymous
    June 16, 2006
    I don't know about you, but requiring someone to sign an NDA for up-to-date (i.e. less than 9 years old) documentation just does not say "Microsoft has made a strong commitment to work towards better interoperability" to me.

    If you just don't understand how empty those words sound, well, ... I don't know how else I can try to put it.


    So the old formats are hard to work with and relatively easy to corrupt. For a minute there I thought you were trying to make an a case for not releasing complete, up-to-date documententation, but all the things you've said sound like awfully good reasons for releasing it to me.

    Yes, it's hard to work with the binary files, but people have found a need to do that, and some of them are going to continue to have that need for a while after Office 2007 is released, which itself is still >6 months(?) away. Simply deciding something along the lines of "Our customers say they want to interoperate with .doc files, but they don't really want that; they actually want something else to interoperate with which we'll provide for them at some point in the future" does not actually make it so. And telling customers that they don't really want what they've been asking for for years when the alternative is still many months away (and subject to slippage?) is, IMHO, more than a little patronising.

    (Just wondering, why is the fact that the old formats weren't designed for interoperability and are just a memory dump relevant? How does that change the need for documentation, the benefits documentation will bring, or provide any kind of reason for holding it back?)


    I still see no reason to believe that "Microsoft has made a strong commitment to work towards better interoperability", and their refusal to publically release up-to-date documentation for the .doc format are in any way compatible. And if I have to choose between believing what MS say and what MS do, I'll believe what they do almost every time.


    So, if MS really have made this commitment, why haven't they publically released the relevant documentation again?

  • Anonymous
    June 16, 2006

    Adam, don't bother comment. This blog is not informational.

    Old binary file formats such as Excel's are not memory dumps. Quite the opposite, those were designed to be extensible.

    This extensibility is used by Excel 2007 to store records downlevel so that if you save a .xlsx file as a .xls file then reopen it in Excel 2007, you will still have the non .xls native features available. Example : Excel 2007 data bars. Start Excel 2007, add data bars, save it as Excel 97-2003 compatible file, open it in Excel 97 (obviously the databars won't show up), make some changes, save, reopen in Excel 2007, surprise: databars are still there.

  • Anonymous
    June 16, 2006

    And you are right that the latest binary file formats are not documented. To my knowledge, the closest you can get is the MSDN 1998 documents related to Office 97 file formats (doc, xls, ppt, mso). But even those are far from being an exhaustive documentation of file formats. In addition, anything newer that was introduced in Office 2000, Office XP and Office 2003 has zero public documentation.

    There are comprehensive third-party implementations out there.

    And yes, the .doc file format are not going away just because Microsoft thinks they'd like to introduce .docx/.docm and so on. .doc will have to be supported until the end of time. What Microsoft wants customers to believe is that they are the only reliable provider of .doc files.

    And because Office files (both legacy and new ones) serialize OLE, it makes the cross-platform interoperability an "interesting" discussion. I mean, as long as Microsoft is also committed to "full fidelity" with legacy file formats at the same time, they cannot simultaneously have and not have OLE.

    Let's face it, OLE (read : Windows-specific implementations) is not going away anytime soon. Not only there are OLE-binary parts in new 2007 file formats, which requires Windows-specific clients to render/process it, but Microsoft even introduced a new scenario involving OLE whenever the documents are password-protected.

  • Anonymous
    June 16, 2006
    Mike, welcome back. I'm glad to hear that this blog is not informational and that you just shouldn't comment... yet you leave two back-to-back comments yourself. :-)

    So the Excel binary format is extensible huh??? NO KIDDING!!! :-)
    How do you think we've been able to add new features the past 4 releases without breaking the file format.

    Whether a format is extensible and whether it maps to the internal memory structures of an application are completely orthogonal. The internal memory structures are extensible, that's why you can edit a file with newer functionality in an older version and it will often roundtrip. That doesn't mean it was designed for interoperability though. My point in saying that the formats closely mirror the internal memory structures was to say that unlike the XML formats, the old formats were not designed with interoperability in mind. Interoperability and extensibility are completely different things.

    Not sure how many times I'm going to have to address this OLE thing for you. OLE compound doc is an extremely simple container format that lets you store as much data as you want in multiple streams. When we encrypt a document, we take the entire ZIP package, encrypt it, and put it into one of those streams. It wasn't clear how "open" the existing ZIP encryption technologies were, so rather than take a risk there we just decided to use OLE as the container format in that case. It's easy enough to get at the stream and unencrypt it (if you have the password), so I'm not sure why you keep coming back to this.

    As far as these other OLE-binary parts in the formats, what are you refering to? Do you have files where you have embedded objects in them? Office allows you to embed a whole host of things. A number of those things that can be embedded aren't owned by Office. You can embed on OpenOffice document in a Microsoft Office document for example. We'll document how that object is stored in the file, but it's up to the producer of that object to document their persistence format. If you embed a Spreadsheet in a Presentation for example, it's easy to parse through the PresentationML to get at the spreadsheet, and since the SpreadsheetML format is documented, you can easily parse it as well.

    -Brian

  • Anonymous
    June 16, 2006

    Brian,

    My comment on the value of comments here was to Adam. I keep wondering about such discussions since day one, particularly on this blog. But discussions between people outside the fence can be interesting, and this is why I have added comments. No contradiction I am afraid.

    Let me address your points,
    - "internal memory structures being orthogonal to extensible" : Whatever. I have said the BIFF container is not the memory dump of the opened document. There is no contradiction with BIFF being extensible. The old binary file formats, for instance Excel, was just as extensible as XML. By the way, you guys used another extensibility technique (OLE sub-containers) to add Custom XML data sources to Excel 2003 files. It's not a new feature of Excel 2007 I am afraid, and by that I mean not a requirement to go XML only for Custom XML to be serialized.
    - OLE : it will haunt you for a very long period of time I am afraid. I would rather have you come up honestly about what the new 2007 file formats are made of. You have said it's 100% Xml many times before. It's not the case. One example is enough to prove you wrong.  You cannot, for instance if you are doing bulk server-side processing, take for granted that .xlsx can be unzipped and so on. So when you speak ZIP-XML so openly, for obvious "interoperability" reasons since ZIP and XML are interoperable across platforms, you so interestingly forget to mention OLE. So it's ZIP-OLE-XML. Period. You know, you could have gotten rid of the entire issue if you had created a special file format extension for this case, such as .xlsp. This would have clarified the situation : .xlsx = zipped-xml period.
    - "it's easy to parse through the SpreadsheetML format is documented" : and do something really meaningful or just parse it for fun? I mean, you'll have a hard time to prove me wrong when I say that SpreadsheetML is an XMLized BIFF. Tell me how easy, as a consequence, how easy it is to really understand the xml you are parsing when there are indexes and references all over the place? It's not impossible, but to claim that it's easy... come on.

  • Anonymous
    June 16, 2006
    Mike, I said it before many times and I will repeat for you: programming is not supposed to be easy and programming anything useful is certainly hard. Old BIFF format is much harder to handle and extend than new XML-based format.

    As for ZIP-OLE-XML, Brian answered your concerns before and several people cited him to you, yet you insist on ignoring them. Are you hopeless or just being trollish? Pray do tell.

  • Anonymous
    June 16, 2006
    The comment has been removed

  • Anonymous
    June 16, 2006
    Hey Mike, let's try to clear things up here. I have to tell you though it's starting to wear on me that you continue calling me a liar as well as accusing me of spreading FUD. How is a statement like "this new XML world provides countless ways to enable scenarios" FUD? Maybe you don't agree with it in which case you'd say it's an over exaggeration or lie, but FUD? You're free to say what you want, and I don't delete comments, but please let's keep things productive here.

    Let's try and make some progress. I've stated numerous times quite plainly that when a file is password protected, you have ZIP + XML which is then encrypted and put into an iStream. There's documentation all over the place, and freely available libraries for working with the compound doc format. Here's an article for instance from a couple years ago showing how to use a freely available Java library for cracking the files on Linux: http://www.newsforge.com/article.pl?sid=04/10/21/1452220

    Far less than 0.1 % of Office files are encrypted, and so I don't really think it's necessary to always include a caveat that the ZIP + XML can be encrypted at times. I'm not hiding anything and I certainly am not lying.

    Now, are there any other cases where you are troubled by the possibility of OLE being used? Or is the file encryption issue your big concern. If that's the case, then can we move on? Is there anything else you need to know about that scenario that I haven't explained?

    OK, now to your point about the binary formats and whether or not they are a dump of internal memory structures. Somehow we are talking past each other, because I completely agree with most of what you say. Yes the binary formats are extensible. Yes anyone can write extra stuff in them. Yes you could put custom XML into a separate stream. I totally agree with all of that.

    Now, was the old binary format designed with the intention of other people writing solutions around it? No. Instead all the extensibility we focused on was in our object model, which meant you had to have the Office .exes. The new format was designed so that wouldn't be necessary. The old formats were designed with a much different goal in mind that what we consider to be important today. The new formats are targeting the scenarios that we get excited about which is for Office documents to play a more important role in business processes, and interoperability is a must have for that to work. Of course some stuff would still be much easier just using the Office OM, but that's to be expected.

    Believe me, we have a number of large customers who are already building solutions and prototypes of consuming and generating the new file formats. Barclay's capital is actually working directly with us at Ecma, because they care very deeply about the file formats for Excel being stored in XML and that XML being fully documented. I don't really care to argue with you about whether or not the XML is useful at this point. We have 300,000 third party developers already building solutions using the XML support from Office 2003, and that number will multiply once we release Office 2007. If you don't care to use them, that's fine, don't. But don't get upset at me for talking about something I've spent the past 5 years working on and talking to customers about. I know for a fact that it's a huge step forward from the old formats.

    -Brian

  • Anonymous
    June 16, 2006

    I said you were lying. Well, if that wasn't for someone pointing the issue of "OLE, not just ZIP and XML" over at openxmldevelopers.org, you would have never mentioned it.

    "The new format was designed so that wouldn't be necessary." : let me correct you, the new format was designed so that wouldn't be necessary in a number of documented cases that will be documented and that the API will make apparent.

    "Of course some stuff would still be much easier just using the Office OM, but that's to be expected. " : yep, and everything is possible. You simply access the objects and never have to worry about indexes and references, you never have to worry about ZIP or XML or OLE, you can use the engines to keep everything consistent (calculations, repaginations, merge, ...).

    "We have 300,000 third party developers already building solutions using the XML support from Office 2003" : XML is not supported in Office 2003. XML is only supported in a special edition of Office 2003. I would be happy if you said "countrary to Office 2003 limited support for XML, we've addressed the shortcoming that we've created and XML as a serialization file format, and XML as a data source provider, is now available in all editions of Office 2007 for Word, Excel and Powerpoint".

    As for your enthusiasm about XML, who wouldn't? I mean, as long as you stick to facts.

    "I know for a fact that it's a huge step forward from the old formats. ". I don't think it's Microsoft and customers, it's more Microsoft, ISVs and customers. At least if you are still talking about a win-win ecosystem, not one that undermines the middleman.

  • Anonymous
    June 16, 2006
    The comment has been removed

  • Anonymous
    June 17, 2006
    The comment has been removed

  • Anonymous
    June 17, 2006
    Mike, I'm not sure what I've done to make you want to contradict every point I make, but whatever it is, I'm sorry. I don't know why you think I spend time out of my day writing on this blog, but if it's not obvious let me make it clear. I've worked with a group of folks on these formats for the past several years, and it's been really hard work. We're all very proud of the work we've done, and I care deeply about making sure that people understand everything about these formats, and I want to see people benefit from the huge shift that's been made (both customers and ISVs of course).

    Do you know what percentage of all the interesting information on the file formats I've covered on my blog since I started? Probably less than 1%. There are tons of areas I've yet to talk about, and file encryption was one of them. Does that mean it won't be documented? No. It just means that it wasn't something I had gotten to yet on my blog (and probably wouldn't have gotten to for quite some time if you hadn't brought it up, so for that I thank you).

    If the approach we take with encryption means that you can't work with the files, then I'm sorry. Maybe once we've been able to fully document it, you'll change your mind (at least I hope so).

    I hope you understand though that I'd really like to move on and talk about other things at this point. As soon as the encryption stuff gets documented, I'll point it out, and hopefully then we can pick up the conversation again.

    -Brian

  • Anonymous
    June 17, 2006
    The comment has been removed

  • Anonymous
    June 19, 2006
    The comment has been removed

  • Anonymous
    July 05, 2006
    Today we are announcing the creation of the Open XML Translator project that will help translate between...

  • Anonymous
    September 21, 2006
    A comment was posted today that had a lot of thought put into it and rather than just replying to it...

  • Anonymous
    October 24, 2006
    A comment was posted today that had a lot of thought put into it and rather than just replying to it

  • Anonymous
    May 31, 2009
    PingBack from http://portablegreenhousesite.info/story.php?id=10638

  • Anonymous
    June 12, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=8132

  • Anonymous
    June 15, 2009
    PingBack from http://debtsolutionsnow.info/story.php?id=1726

  • Anonymous
    June 16, 2009
    PingBack from http://fixmycrediteasily.info/story.php?id=10234

  • Anonymous
    June 18, 2009
    PingBack from http://outdoordecoration.info/story.php?id=1901

  • Anonymous
    June 18, 2009
    PingBack from http://barstoolsite.info/story.php?id=2398

  • Anonymous
    June 19, 2009
    PingBack from http://mydebtconsolidator.info/story.php?id=9151