Microsoft Office Open XML Format does not require upgrading to Office "12"
This is old news to a lot of you, but I wanted to call attention to this again because I've read some articles where it appears that some folks still aren't aware of the work we are doing to make the new XML formats backwards compatible. I've already talked about how the existing legacy XML formats will continue to be supported going forward. Even more importantly though, the new Office "12" XML file formats will work in existing Office versions as well. That's right, you don't need to upgrade to Office "12" to use the new XML formats.
We will provide free updates for Office 2000, XP, and 2003 that allow them to both open and save the new XML formats. This is great news for solution developers; IT admins; and end users. I know that for a lot of you who have been reading my blog for the past few months, you're probably already aware of this. I just wanted to call attention to it directly incase you were worried about what the costs were for moving into the new XML formats.
This work is something I've been really proud of since we first started doing the work. It was a really big investment on our part to actually port this back to the past three versions. To just support reading the formats would be one thing, but support both read and write was a pretty big task. It's something I'd always viewed as a must have though. I think there is so much value to these new formats and getting this to as many people as possible is really huge. It also makes it a lot easier for people that have moved forward to Office "12" to share their documents. They can either use the old binary format, or they can use the new XML formats.
Of course, anyone else is free to come along and build a tool that reads and writes the new formats too, so if you aren't currently an Office customer there are still other possibilities. We've talked at length about the royalty-free licenses that we provide that basically allow anyone to build on top of the formats. There has been some discussion around a specific license that is not compatible with ours, but the large majority of licenses out there are 100% compatible so there are a lot of choices. I thought it was worth pointing out again what we at Microsoft are providing directly though.
-Brian
Comments
Anonymous
October 11, 2005
Brian,
Can you comment on how the older versions will handle some of the newer features?
For example:
1)Will the older versions simply read/write the new format, or
2)Will the Office 2000 interface, for instance, be updated to enable editing XML?
3)How will Office 2000, for instance, handle documents built upon custom schemas? Will the documents be editable in accord with the definitions laid out in the custom schema?
Thanks
DarrylAnonymous
October 11, 2005
This seems like a good thing overall. As a customer i would be interested to know how long and how well you will support each XML version (for let say office 13,14,15..) on say Office 2000.
Will there be a new version of Office Open XML for each iteration of office ?Anonymous
October 11, 2005
Scot, the format will not have any breaking changes made going forward. We don't want to be put in a spot where we need to provide updates to Office "12" in order to support opening files from Office 14 for instance. Of course, as new features are added, we need to come up with ways to represent those in the format. We won't change the way existing features are represented though, which is really important to maintain true compatibility.
We have designed into the format a future extensibility mechanism so that as new features are added in future products, that functionality can be persisted in the format without breaking it's backwards compatibility. Of course, we can't build those new features into the older products so the older version may not be able to display the new feature, but it could show everything else. We've actually done work though to allow the future versions to also embed alternate representations for the new features so we could at least provide a fallback for the earlier version to display.
Darryl, the support will be handled the same way is it is today when you collaborate on a .doc file. The older version can read and write the new format, but it will of course not be updated to support all the other new features. The new features will require the new product. So in your example of custom defined schema support, you would of course be able to open the file and save it back, but the schema validation would not be there in Office 2000. There are only a handful of features like this though, as most functionality can either be mapped back to existing functionality, or it isn't a feature persisted in the format (such as the new User Interface).
-BrianAnonymous
October 11, 2005
So Brian, to provide a more concrete example:
Let's say in 30 years time I suddenly get the urge to stop sysadminning and get back to doing some programming.
Question (1)
Will you guarantee that I will be able to take out a license for (the 2005) Word XML then (assuming I DON'T take a license now)?
Question (2a)
Will you guarantee that I will be able to take out a similar license for the Word 2035 formats then?
Question (2b)
What happens to the licensing program if Microsoft goes bankrupt in 2020?
Obviously feel free to substitute any positive N instead of 30 above...Anonymous
October 11, 2005
What about Office for Mac? Will this be available for v. X, 2004, or future versions?Anonymous
October 11, 2005
Well, take Excel 12 conditional formatting improvements. Some of these are brand new and require to be persisted in such a way that older Excel versions ignore them. The downside to this is that for any person really taking advantage of the features, instead of writing custom VBA macros or layers of formulas, the older Excel versions will degrade to just poor workbooks.
Let me give an example : colors. Let's say you create new such conditional formattings with colors. Since older Excel versions don't support the new conditional formattings, then no color gets applied to cells. Unless I have missed something completely obvious. Can you spend some time explaining the downmwards migration path ? This matters as much as the new features.Anonymous
October 11, 2005
Hello!
Support for Office 2000 is news for me and it's great. Does this mean only Office 2000 in Windows XP/2000 environment. It would be great if this would work in Windows 98SE ...
RomanAnonymous
October 12, 2005
Rick from the Mac BU <a href="http://blogs.msdn.com/rick_schaut/archive/2005/06/01/424086.aspx">has stated</a> that there will be XML converters for Mac Office v.X and Mac Office 2004. It will also be the native format for the new Mac Office (2006 or 2007?)Anonymous
October 12, 2005
>>existing legacy XML formats will continue to be supported
Brian,
If an entry is made past column IV in Excel 12 will that save in the binary format? If so, will older Excel versions be able to open that file now? After the patch? (Obviously older Excels will not be able to do anything with such entries. But where they might abort loading now they might fail more gracefully with the patch, loading what they can for instance).
JimAnonymous
October 12, 2005
The comment has been removedAnonymous
October 12, 2005
Interview with Gary Edwards on OFD and MS XML. The general impression I get is that MS XML is about to get run over by a Mac truck.
http://madpenguin.org/cms/html/62/5304.htmlAnonymous
October 12, 2005
Eduardo, most of the links you've been leaving are from OpenSource folks and OpenSource news sites, so it's not surprising you're seeing this as a one sided issue. I'm glad to see your interest and enthusiasm, but I think it's a bit misguided to make your assumption about the Mac truck (I assume there was no pun intended on the whole "Mac" thing). I'll be sure to look both ways though before I cross the street just in case. :-)
By the way, we had our Office developer conference last spring and at that time there was a study done by our research group to look into how many Office developers we had out there. The numbers were around 1 million people involved in developing some type of solution on top of Office. Of those folks about 1/3 of them were using the XML functionality. That's about 330,000 developers building on top of Office XML support. That hardly sounds like a closed format that's about to be overrun...
Darryl, I understand your concern there. The custom defined schema support is extremely valuable for solution builders and is really an important part of the overall vision for opening up our documents using XML. The "reference schemas" (WordprocessingML for Word; SpreadsheetML for Excel; OpenDoc for OpenOffice) are really valuable if you want to understand the application level data. Custom defined schema is where you actually are able to truly open up the formats since you can mark them up with your own data, rather than being limited just to our predefined structures. I really do wish it was as easy as you suggest to back port the custom schema support to older versions of Office, but it would actually be a huge project in itself. It would essentially be the same as rewriting much of the application if we really wanted to preserve the custom defined tag structure. All the logic necessary to maintain well formedness while still the same editing behaviors was a lot of work that we took on in Office 2003.
I think the best way to think about this is to look at the type of functionality we're talking about. The support for reading and writing the format is essentially an update to our File I/O code. It's a very isolated piece of functionality (essentially a translation), that was a lot of work, but still manageable. To support Custom defined schema would mean we'd have to actually modify the run time behaviors. We would need to change the way we read the text stream to understand the tags, and update the edit behaviors. This is really changing a huge portion of the Word .exe. I know it appears we should be able to just worry about this at save time, but that's actually not the case. We need to do work to ensure that the structure is preserved while the file is being edited. I would love to have the ability to backport it, but this is really one of those pieces of functionality that you'll need a newer version of the product for.
-BrianAnonymous
October 12, 2005
Two from David Berlind:
http://blogs.zdnet.com/BTL/?p=2002#more-2002
http://blogs.zdnet.com/BTL/?p=1998#more-1998Anonymous
October 12, 2005
Inquisitive - Steven Sinofsky (Senior VP for Office) sent a signed letter to the European Union directly addressing that very concern. We will continue to provide the licenses going forward and continue to represent everything in XML (other than obvious binary type structures like pictures, Active X controls, etc.). http://www.microsoft.com/office/xml/response.mspx
So the answer to your first two questions are yes and yes. I would assume that the last question is the same for any format that's out there. I don't see it being a problem (other than the fact that I'd be out of a job).
Anon - Each new feature will have different behaviors when roundtripping through the older versions. This is directly tied to what we are able to roundtrip, or map back to a similar feature. Colors for instance will be mapped back to the closest match. We also allow the user to put the application into a "throttled mode", so that any new functionality that won't work properly with older versions is disabled which allows you to ensure that you don't introduce something that won't work with an older version. If you don't care about how it works with an older version, than you can of course upgrade the document to get out of throttled mode.
We've done a lot of work on this over the past year. We've been meeting with a collection of about 20 customers for the past 6 months to work through the issues that will come out of this and to make sure that the collaboration experience is as smooth as possible. We have an even larger group of folks we are going to work with directly after we ship Beta 1 to see what issues we missed and what there is that we need to address before we ship. Of course, in every new version of a product you introduce new functionality that may not work with a previous version. We also knew though that in changing the default format, a lot of folks would have a more critical view of this type of behavior. There is only so much you can do, but we decided early on that if we were going to change the default formats, we had to go above and beyond what we've done in the past to make sure the transition was smooth.
Jim - The limitation on the number of rows and columns was directly tied to the architecture of the old binary formats. So if you have a file in Office "12" that exceeds that previous cap on rows and columns, and you save into the old binary format you will be warned that the extra data will be removed. The same is the case if you save in the new formats and send it to someone on Office 2000. If they have the update they can open the file, but they will be warned that not all the data was preserved.
-BrianAnonymous
October 12, 2005
Microsoft may decide to add ODF support if its customers want it:
http://www.consortiuminfo.org/newsblog/blog.php?ID=1642
Brian, this fits with what you said a while back.Anonymous
October 12, 2005
Redmonk's Stephen O'Grady:
http://searchopensource.techtarget.com/originalContent/0,289142,sid39_gci1132351,00.html
Regarding my Mack truck comment: I read the David Berlind's blog "Could ODF be the Net's new, frictionless document DNA?" and now I think MS XML will be run over by a fleet of Mack trucks.Anonymous
October 12, 2005
I can understand concerns about documents with custom schema being mangled by old versions. I wonder if it'd be possible to put a flag in the document that could tell old versions' load/save code "don't load this document, its author said it requires Office 12" or "this document should be treated as read-only, and can not be saved, in versions of Office older than 12".
That way, backward compat would be preserved, but document authors could still ensure that documents with custom schema they must preserve would not get damaged.Anonymous
October 13, 2005
When will the updates for office 2000, XP, and 2003 be available?Anonymous
October 13, 2005
Craig - We'd thought about doing something similar to that, but after talking with some folks decided it was too confusing of an experience. We can look into it again though after Beta 1 if folks think it's something we should solve.
Chuck - The updates will be made available at the same time as Office "12".
Eduardo - Thanks for the links (even though they were pretty biased). None of the folks in those articles seem to be aware that most customers that are interested in accessible and open data care more about their own schemas than they do about a schema originally defined by StarOffice or Microsoft. I that the articles are still a pretty close minded when looking at the differences between our XML formats and the one from StarOffice (that madpenguin article had a number of inaccuracies for example), but even then it's only part of the story. Like I've said before, it's the combination of the reference schemas (WordprocessingML) and the custom defined schema (whatever the customer's data is) that really gives the true interoperability with business processes (aka document DNA). You don't need to worry about the Mack trucks. :-)
-BrianAnonymous
October 13, 2005
Brian: I think the custom-schema stuff, while initially attractive, fails on the business level for the reasons Craig and Darryl pointed out. It basically closes the format down: if you use Office 12 custom schemas, you can't safely let anything other than Office 12 touch the documents lest they be rendered unusable by Office 12. This is not, from a business standpoint, a good thing. If I can afford to upgrade my entire business to Office 12 and can either keep the documents internal or require everyone I do business with to use Office 12 I can use custom schemas safely, otherwise I have to avoid them completely. I think that's why traditionally custom schemas have been reserved to custom applications, not general-purpose ones like word processors.
What I'd do in the plug-ins for older versions is deliberately avoid preserving features I couldn't support. Office 2000 could read and write the format, but if it couldn't support for example custom schemas correctly it wouldn't write custom schema information back into the saved document. That causes loss of information, but at least it avoids silently creating a broken document.Anonymous
October 13, 2005
Todd, I find the custom schema support really interesting. I think it'd be nice to be able to protect documents against being saved over by older Office versions, but there are non-technical and alternate technical approaches available too. Nothing stops me just limiting who can write to those files using standard filesystem permissions and groups, or if possible perhaps using a macro to do version detection and refuse to save. I don't think the lack of any sort of lockdown in the support being added to the older versions is a fatal flaw or failure.
On a side note - I do hope the next Publisher gets XML support. I really like the idea of being able to dynamically generate job templates with predefined PDF export settings. My employer doesn't use or intend to use Publisher, but some of its customers do. Built-in PDF support will be wonderful when accepting jobs from those clients, but we still need to actually get users to use the right settings - always a challenge.Anonymous
October 13, 2005
Replies to foxnews article:
http://www.foxnews.com/story/0,2933,172063,00.html
Brian, if you don't like the links I post here, why don't you post some, from people who don't work for Microsoft or an organization funded by it, that support your position?Anonymous
October 13, 2005
Berlind:
http://blogs.zdnet.com/BTL/?p=2009#more-2009
O'Grady:
http://www.redmonk.com/sogrady/archives/001036.html
check out especially the long Gary Edwards comment.Anonymous
October 13, 2005
Hey Brian, any chance of an answer to my earlier question?
If I didn't phrase something clearly enough then please let me know, though I tried to come up with usage scenarios that are amenable to simple yes/no type answers.
I think it's important that my (future)grandchildren will be able to write software to read the documents I produce today... but as things currently stand, I understand that you won't guarantee they will be able to. This seems to me to be a huge shortcoming in your licensing and a very valid concern!Anonymous
October 14, 2005
Inquisitive: I don't think that really makes sense (re future readability by grandchildren etc). The schema license is entirely fuss free, it's only the patent license that's mildly troublesome. Any patents that currently exist and apply will expire in a few decades on the outside. Additionally, nothing can stop you from simply ignoring the patents and writing a suitable XSLT filter to read the documents. Not unless someone invents the automatic patent rights enforcement brain-stem implant, anyway :-P .
That said, I don't imagine writing a custom filter to convert from some ancient format - especially if you want full layout, not just content - would be at all fun. On the other hand, is there much chance that apps will support any current format by then anyway? My bet is on "no".
I think issues with future readability stem more from concerns about changes to the format and software over time, such that we end up in a situation where we're looking at these new formats like we'd now look at a Word 2.0 for Mac file - "huh?".
How valid these are remain to be seen - they were certainly very reasonable with regards to the Office binary formats, though I understand that's despite the best efforts of the Office folks. XML is hardly perfect, but well documented XML is more amenable to manipulation and conversion, and since the specs can be saved there's little reason to worry about the documentation. I'm personally not all that worried, though I'd prefer an open standard core for a document format.
Do remember that some standards fall into disuse and are superceded, so an open standard document format is still no perfect solution. A well designed one that was suitably extensible would sure be a nice start, though.
Really, I think the patent licensing is more of a question for the short/medium term, in terms of how it affects what others' who wish to use the formats can do. I've explained my reasoning on this as well as I think I can earlier, and won't repeat myself.Anonymous
October 14, 2005
He is posting so much on this topic, I wonder if he is getting ready to write a book:
http://blogs.zdnet.com/BTL/?p=2017
http://news.zdnet.com/2100-3513_22-5893208.htmlAnonymous
October 15, 2005
Eduardo may be right. The book would be interesting.
I found that latest, lengthy account from David Berlind to be pretty thoughtful and informative of the possible disconnects that occured.
http://news.zdnet.com/2100-3513_22-5893208.html
I think Berlind's approach is pretty balanced, to the degree that I can compare it against experiences of my own. With regard to the extrapolations for the future, who knows?
I also found Mike Miller's comments to be usefully constructive, including the one at
http://news.zdnet.com/5208-3513-0.html?forumID=1&threadID=14222&messageID=285299&start=-1
This won't get us the whole picture but it does provide a sense of the foibles and agendas that cloud complex undertakings like this. I admire that Berlind is willing to live with gray and look for fog-clearing, confirmable detail, rather than going polar black vs. white, good guys vs. bad guys, etc.
There are others who don't ground their speculations so well. I am impressed by Berlind's accounts generally.Anonymous
October 15, 2005
The comment has been removedAnonymous
October 15, 2005
There is another source which mentions the binary key.
http://madpenguin.org/cms/?m=show&id=5304&page=2
quote:"
That binary key holds a great deal of the information that we need about the layout definitions of the Microsoft XML file format. We can do a content-based transformation very well. Microsoft's content is in perfect XML file format. Their styles, though, are locked up in that binary key. To make any kind of exchange possible with Microsoft XML documents, we have to first figure out how to cope with that binary key."Anonymous
October 15, 2005
The comment has been removedAnonymous
October 15, 2005
Brian, I have some questions about MS XML as compared with ODF.
ODF, from what I understand, is designed to be a universal file format. That is, the goal is that all the information in any file format should be able to be stored in ODF without loss. This would allow it to be use as the native format in many applications, and, most importantly, a universal translation method between any two different formats.
To that end, the ODF TC spent years studying many hundreds of different file formats running on applications on various OS's. ODF cannot yet handle all file formats, but it has moved a very long way toward that goal.
Now the questions. Was it Microsoft's intention that MS XML also serve as a universal file format? Or is it designed to handle the data from a narrower range of formats, and if so, what is covered and what isn't? And during developement, how broad a range of formats was investigated? In particular, what about formats that don't run on Windows applications but rather only on other OS's?Anonymous
October 15, 2005
I don't really know where the talk of a "binary key" is coming from. I won't speculate whether that comes from lack of information or if it's just malicious, but you can look at the formats yourself and you'll see that the reports are 100% incorrect.
I challenge anyone to find a Word XML file that has a binary key containing "crucial layout information." If you do find such a file, remove the binary information and reopen the file. What's changed in the layout? Of course things like pictures are stored as binary objects, as well as embedded OLE objects since we don't control what formats the OLE object wants to store itself.
It looks like there is just a misunderstanding out there that is unfortunately causing folks to jump to some pretty extreme conclusions. All you have to do is look at the facts. Our files are stored in XML, and all the XML is fully documented. You can freely download the documentation, so there is no need to "reverse engineer" anything.
Eduardo, I'm surprised that you would conclude the ODF format is designed to handle all forms of documents. It was an effort to standardize the existing StarOffice format. I'm sure they will continue to evolve it going forward, but I don't see it being anywhere close to a universal solution. In that article you referenced Gary Edwards said that they were looking to replace HTML, which has been evolving for over a decade, so I'm not really sure what their true target is. We already saw that a feature as basic as formulas in a spreadsheet isn't represented. That's a pretty common one, and if that's not there, who knows what else is missing. What about custom defined schema support? How can you support all document types if there isn't support for everyone's schema? We allow anyone to add their own schemas to our files, which is true extensibility.
-BrianAnonymous
October 15, 2005
Gary Edwards on the binary key, and much more:
http://www.zdnet.com/5208-10532-0.html?forumID=1&threadID=14220&messageID=285641
Brian, regarding ODF being a universal file format: You say that it was just a standardizing of the StarOffice format. That is completely false. The ODF TC started with the OO format, and then spent three years looking at hundreds of other formats, including everything they could figure out in Microsoft formats. The goal is a universal file format, at least for all types of documents.
See for instance this from Berlind:
"ODF isn't just for front office productivity applications (word processors, spreadsheets, etc) as has been often implied by the way it is so often tied to OpenOffice.org (sidebar: it will be supported by other MS-Office substitutes as well; for example solutions from IBM and Corel). There's no reason, for example, that, regardless of what proprietary markup languages the different wiki solution providers use to put a pretty face on Web authoring, that they cannot natively store those documents in the XML-based ODF. Come to think of it, what documents can't be stored in ODF? What about browser-generated documents that are authored in GMail, Yahoo Mail or even blogs? Once a few key providers of these different document authoring tools decide to natively store their documents in ODF, then the ODF format could enter a viral stage that turns ODF into the underlying DNA to anything capable of generating text. Were this to happen, Microsoft would have no choice but to support ODF (something it's apparently considering) since at that point, it would not only be odd man out, the number of ODF-compliant documents being generating by all the ODF-compliant authoring tools in total would begin to catch up to Microsoft's file formats."
read the whole post:
http://blogs.zdnet.com/BTL/?p=1998#more-1998
Brian, I think you are running scared on this. You know how fantastically useful a universal document format would be, how it would catch on like wildfire, and how it would be an absolutely mortal threat to Microsoft, so you try to trick people into believing ODF isn't just that, or at least something pretty close.
OK, I know I will never get you to admit that publically. However, everyone else can read Berlind's post and see what I mean.Anonymous
October 15, 2005
Eduardo, you need to buy yourself a clue. Heck, a couple dozen of clues might not be sufficient <g> Here is one free of charge:
http://en.wikipedia.org/wiki/Ontology_%28computer_science%29
ODF cannot be (usefully and universally) employed for something for which it does not have a well-defined ontology. So far it contains fairly well-defined ontology for OOo/SO supported subset of file formats. If you try to use it for storing something other than that, guess what, nobody would be able to correctly interpret your data before you define ontology for it. And even if you do, it would be useless until included in the standard - the same way as spreadsheet formulas currently are.
As for Berlind's post, my suggestion to use TAR/GZIP container stays. To paraphrase: "Come to think of it, what documents can't be stored in .tgz?" None, really, yet I don't see "how it would catch on like wildfire, and how it would be an absolutely mortal threat to Microsoft".
Love, SDJAnonymous
October 16, 2005
The comment has been removedAnonymous
October 16, 2005
A detailed report by OASIS on the Massachusetts decision:
http://xml.coverpages.org/ni2005-09-26-a.html
Read this and you will get a lot better sense of Massachusetts reasoning.Anonymous
October 16, 2005
The comment has been removedAnonymous
October 16, 2005
Last year the EU looked at ODF and MSXML and decided ODF met its requirements and MSXML did not:
http://europa.eu.int/idabc/en/document/3439Anonymous
October 16, 2005
Or, put another way. Where is this universal transformation layer described? There is no language or functionality like that anywhere I've looked in the ODF spec. I haven't read it all, but if it is so fundamental to the architecture of ODF, where is it addressed?Anonymous
October 16, 2005
ODF is the universal transformation layer. Once you have converted a file format to ODF, then it is in the same langauge as any other file format that has been so converted. See the Gary Edwards interview linked above. You also might wnat to check out the Wikipedia article.Anonymous
October 16, 2005
Eduardo accepts the claim: "ODF is the universal transformation layer" based on "Once you have converted a file format to ODF, then it is in the same langauge as any other file format that has been so converted."
That's a tautology and meaningless. It presumes that someone else has already arranged the conversion. I can say the same thing for XML without ODF, I can say the same thing for Office Open XML, and have added no content to this discussion (and my statement is just as accurate).
I just did a search for all uses of the word "transformation" in the ODF specification. The majority of them are about how transforms are specified for drawings and images (and format objects).
A few others are about some things have been done to make XSLT go easier.
A couple of those assertions seem to be mistakes, since you actually have to check attributes of elements for some information even though the idea stated elsewhere is that you can ignore the tags and just use the text. But then it seems that material marked as hidden will be revealed in a straightforward transformation. Heh.
These few steps by which certain transformations "should be simpler" are about possible transformations out of ODF for repurposing purposes. There is nothing that demonstrates how well ODF serves as a universal carrier or any work that was done to assure that it is any more so than any other highly-functional office document format using XML.
I don't doubt that the member from Boeing did a lot of work, I just don't know what it was and where the concrete results are. They are not reflected in the specification in any material way that has surrendered to my inspection so far.Anonymous
October 16, 2005
more about the binary key:
http://www.groklaw.net/comment.php?mode=display&sid=20051016105739574&title=What%20Binary%20key%3F&type=article&order=&hideanonymous=0&pid=369059#c369217
who is saying "the truth" ?Anonymous
October 16, 2005
The whole discussion is good:
http://www.groklaw.net/article.php?story=20051016105739574
and start at the comment "Which Binary key?"Anonymous
October 16, 2005
orcmid, it's time for us to leave trolls to themselves. There is no chance they start thinking critically or stop their copy-n-paste arguments, so I say let them gnaw on their own feet :)
Love, SDJAnonymous
October 16, 2005
Look, it is just another claim, extrapolated to problems about getting a particular kind of transform to work.
It's simple. Show us the key. Show us that it has anything to do with style preservation. But first, show us the key. Brian proposed a trivial experiment.
Find what you think is the magic key, delete it, then see what difference it makes to Microsoft Office. That's really simple.
I just did the following thing. My M.Sc dissertation draft is a large Word Document. I just opened it in Word 2003 and saved it as WordML (not Open Office XML, the current Word 2003 XML format). I notice that there are attributes so that the files saved in XML can be opened by the original application, but I got around that by opening the document in FrontPage (where I have that feature turned off). I then asked FrontPage to verify the formatting and it was happy. I then asked FrontPage to pretty-print the layout. The file got a lot bigger, but then I could scroll through it and know what I was looking at.
All binary (actually, hex and BinHex encodings of one kind or other) were in two kinds of elements, <w:BinData> and these are all images that I created outside of Word and pasted into the document, and <w:fldData> which are tiny encoded elements I have no idea about.
What I do know is that a 3rd-party PDF plug-in is able to navigate the document and make a perfect PDF of it, preserving everything that is visible, including the table-of-contents, cross-references, and hyperlinks in the thesis. This isn't a complete test, but it suggests that maybe the breakdown the Groklaw people observed has a different explanation and the presumption of the worst trumped that. Perhaps.
I can't find anything that looks like a "key" or any other secret code sort of thing.
You can do this yourself. Open in a non-Microsoft XML editor and have it fix the layout (so you can match up start and end tags) and see if there is a problem. Then we can look it up in the WordML documentation and see what we're really dealing with here.
How can I tell that Groklaw is telling the truth when they don't say how to verify what they say. The formats are real tangible things. The WordML documents are available for inspection. Where's the key? Show me a key (in context, please).Anonymous
October 16, 2005
The comment has been removedAnonymous
October 16, 2005
If there is no binary key, as Brian mentioned, than I am willing to believe that.
Interestingly though, as mentioned by Gary is the following quote:
"If the MSXML binary key and software bindings do not exist, then Microsoft (and everyone else for that matter) should be able to provide the marketplace with clean clear transformation filters enabling easy conversions from MSXML to ODF and back? If they did this, then their software would meet the Massachusetts requirements. But they don't!"
Indeed, why does the MSXML not suitable for MA?
They did mentioned something about missing documentation of a sort during their speech they gave.Anonymous
October 17, 2005
I worked up an example where I went searching for the binary key that is being talked about here. I think I choked the blog comment filter on the markup, so I recreated the comment as a post on one of my blogs: http://orcmid.com/BlunderDome/clueless/2005/10/my-fud-is-fuddier-than-your-fud-so-fud.asp
I did everything I could to cross-check with the reports about the binary key, but I got lost when Exchange Server and invisible converstions to XML and back were dragged into it.Anonymous
October 17, 2005
In your example (on your blog), if you replace the 'schemas.microsoft.com' with e.g. 'schemas.microsofty.com' you document get scrambled.
I don't know about which 'binary key' Gary is referring too, but changing the 'microsoft'name into 'microsofty' and having such impact on the document is not something I would expect.Anonymous
October 17, 2005
If you have to use 'schemas.microsoft.com' in the xml file, which serves no purpose at all further, it is odd.
The only thing I could think of why it does that, is in case the format is patented, in which case your documents have to refer to a patented scheme.Anonymous
October 17, 2005
Patrick, I'm not sure if you've worked with XML much, but there is a feature in the XML standard called a "namespace." This is how you can uniquely identify what type of XML you are reading.
Namespaces are really powerful, and without them it would be difficult to know what you are looking at and what schemas should be used to validate the files. When you change a namespace, you are essentially saying that XML is now a different type of XML. In Office, we support opening everyone's XML files. If it's a namespace that we don't understand, we just treat it like any other custom defined schema. If you read my older blog posts that are all titled "Intro to Word XML", you can see more about how we work with custom defined schema.
You could just have easily changed the "http://" part of the namespace and had a similar experience. There's no conspiracy there... just standard XML practices. :-)
-BrianAnonymous
October 17, 2005
I wanted to add to Brian's comment about namespaces. All of those xmlns:mumble="someURI" attributes in XML elements are very important.
However, the someURI doesn't have to refer to a real web page or even be in the form of a URL. What it has to be is a unique identifier that someone owns, usually by owning the domain used in a URL. I could make up "http://schema.microsoft.com/..." URLs and it wouldn't be kosher. I'd get boo-ed by the XML community, at least. The other part is that rules of the format are determined by schemas that are tied to these namespaces. If you change it, and it isn't a namespace that Word recognizes, it will do something else, as Patrick saw and Brian explained.
The schemas get published and the namespace they go with (if they go with a namespace) is declared in the schema. Schema-aware software caches these schema definitions and then applies them where they see the namespace be used.
Applications, like Word, may have namespace sensitivity built-in, but the schemas are published anyhow in support of interchange and interoperability.
The same is true of the OpenDocument format, and the Relax NG schema (and ODF documents) use namespaces heavily.
BIG TIP: The prefix (mumble in my example, above) doesn't determine the namespace. The someURI does. The prefix is an abbreviation that is used for the namespace and the prefix can be changed to whatever's useful in a given situation. It's a kind of alias. Microsoft keeps theirs real short because they want to keep the XML file compact. Other people have theirs be more descriptive because they are intended for people to use or at least to understand.
I think Brian posted about this earlier in comments elsewhere on this blog.Anonymous
October 17, 2005
I took another look at the business about universal transformation. My post about it is at http://orcmid.com/BlunderDome/clueless/2005/10/magical-thinking-and-universal.asp
I don't think this has anything to do with what Massachusetts was after, based on the accounts I read. There conditions about open formats and open standards don't require universality in any way I noticed. Otherwise, why add PDF to the list? Why leave the door open to other formats?Anonymous
October 17, 2005
"Myth of the binary key": http://blogs.msdn.com/brian_jones/archive/2005/10/17/481983.aspx
-BrianAnonymous
October 19, 2005
The comment has been removedAnonymous
October 19, 2005
The Mac Office team has said that the next version of Mac Office will support the new XML file formats. They also plan to provide updates for older versions to also support the new formats, but I'm not 100% which versions they will do that for.
Here is a bit more information on that: http://blogs.msdn.com/rick_schaut/archive/2005/06/01/424086.aspx
-BrianAnonymous
October 20, 2005
The comment has been removedAnonymous
April 04, 2006
I've had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported...Anonymous
October 17, 2006
PingBack from http://dbdesc.com/blog/archives/adding-a-wordml-xslt-template-to-dbdesc-and-vAnonymous
June 01, 2009
PingBack from http://uniformstores.info/story.php?id=17694Anonymous
June 02, 2009
PingBack from http://uniformstores.info/story.php?id=31683Anonymous
June 09, 2009
PingBack from http://jointpainreliefs.info/story.php?id=1320Anonymous
June 09, 2009
PingBack from http://cellulitecreamsite.info/story.php?id=8583Anonymous
June 13, 2009
PingBack from http://firepitidea.info/story.php?id=1518Anonymous
June 13, 2009
PingBack from http://wheelbarrowstyle.info/story.php?id=1120