Friday links (July 6, 2007)
A couple interesting topics I wanted to link to today:
- Rick Jelliffe makes my week – Rick has a great post called "Slashdotters: all together now… 'Doh!'" that pretty well sums up my experiences over the past several years. When we first announced the XML formats for Office, there was a focused collection of negative feedback on sites like slashdot, and there were demands that unless we did "foo" it wouldn't really be an open format. Well we've actually done all the things folks asked and then some! As you would expect though the they continue to move the target. We fully documented over 10,000 elements, attributes, simple types, enumerations, with over 6,000 pages of documentation and made that freely available; we removed any possible legal IP issues by putting the OpenXML formats under the OSP; we completely gave away the ownership of the formats to a standards body (ECMA, and now ISO) so that even if Microsoft wanted to, we couldn't unilaterally make any changes or block the availability of the documentation; and we've sponsored on open source project that provides translation between OpenXML and another international standard, ODF.
- Sun builds ODF translator for MS Office – Speaking of translator projects, sun this week announced the availability of a plug-in for Microsoft Office that allows you to read and write ODF files. You can now see that there are people working on OpenXML support in OpenOffice and ODF support in MS Office. These different translation tools really allow the customer to decide which format they want to build their solutions around, and then use the translators if something in the other format comes along. Malte Timmermann has more information up on his blog: https://blogs.sun.com/malte/entry/sun_odf_plugin_1_0
- One side note here is that it looks like we have a bug in Word 2007 where Sun's ODF converters are able to save but not open the ODF files. I looked into this a bit and it looks like Word will mistakenly assume that the ODF file is a .docx file (since they are both ZIP at the base level). Word sniffs the file to see if it knows what kind of file it is, and only if it doesn't think it can open it will it hand it off to one of the registered converters. Since we think it's a .docx, we actually try to open it, and then of course fail since it's an .odt and not a .docx. I commented on this up on Malte's blog, and I'll provide more information once we've come up with a fix. I'm not sure how long it will take to pull that together, but I'll keep everyone posted.
- PHPExcel – There's a new update to the PHPExcel library that you can use for creating Open XML spreadsheets. You can see the project up on codeplex, or check out Maarten's blog.
- Channel 9 interview – Part 2 of the channel 9 interview on Open XML formats is available here: https://channel9.msdn.com/ShowPost.aspx?PostID=320503
Comments
Anonymous
July 06, 2007
Brian Jones said "We fully documented over 10,000 elements, attributes, simple types, enumerations, with over 6,000 pages of documentation and made that freely available" There are many things factually incorrect in this sentence alone. But let's stick to just one. I knew you are not a developer, but I assumed you were a program manager, you were actually writing specs. I have to ask the obvious question, "fully documented" = "fully specified" ? The average feature specs is 20 pages. Between describing the feature, explaining decisions, explaining how it is supposed to be implemented as a functionality such as screen rendering, explaining how it is supposed to work with related features, explaining edge cases, explaining how it works over time and how it can be migrated forward and backward, explaining how it is compatible with international issues, 508 issues, security issues. With all that, to get a specs of ONE feature in 20 pages, means you are really packing stuff! And yet, when you read the Part4 of the public specs, on average, a feature is described using a sentence no longer than 5 words. Huh? By my estimation, the fully specc'ed Word/Excel/Powerpoint are a monster 600,000 pages. I certainly find it ironic that people out there complain of the "largeness of the public 6,000 pages", when really opening up would take 100 times more. Everything you say is wrong. Take "XML" for instance. Here is a bit of Microsoft idea of XML (this is from a real Word 2007 document) : <w:instrText xml:space="preserve">TOC o "1-3" h z u</w:instrText> As some people can see, there are angle brackets, so it must be XML!! As for your channel 9 video, I'm baffled than you are filming your 5th or 10th Level 100 video where you keep saying exactly the same thing. Can you please move on and say something actually useful? Your boss is Jason Matusow?Anonymous
July 08, 2007
Stephane, Are you joking? The example you list actually is documented. Sure it's XML accompanied by a rather complex string, but we have a few hundred pages that document how field codes work, and the syntax for them. It's a bit misleading to reference that though as an example of what Open XML looks like. The reason I'm still talking at the 100 level is that there are still a lot of people out there who are just learning about this for the first time. That level of education is super important, and while I'd love to go deeper, I need to be careful. Not sure what you mean by the last question unless you're just trying to be rude. Jason and I are on different teams. I'm a senior program manager lead in the Office team. From the development side I've owned the file formats for years now, in addition to other areas like custom XML, programmability, WordMail, smarttags, smartdocuments, etc. Jason works on the standards team. I've worked together with Jason and other folks like Jean Paoli and Tom Robertson on the standardization and policy side of things for awhile now. That portion of my job is a bit outside of the traditional program manager role, but its been a lot of fun. Back to you're main point though... I've already said this before, but I'll say it again... you're really getting to be annoying as you keep repeating the same point. You are expecting the file format documentation to also fully specify the application, and that's just not the case. Look at other file format specifications out there and you'll see the same level of documentation. We are not specing an application, we are specing a file format. If you want the specs on how to build Office, come work for Microsoft and I'll show them to you. -BrianAnonymous
July 08, 2007
@Stephane, don't blame MS for this sort of XML specification they are trying to standardize: they are new[1] to this "open" thing ( the "old-good-times"[2] are gone ) May be in a couple of years they will learn how to give to the world a really open, useful, implementable-by-all-not-only-MS, and interoperable standard ( and i'm not talking about this[3] kind of interoperability ) [1] http://www.robweir.com/blog/2007/06/file-format-timeline.html [2] http://antitrust.slated.org/www.iowaconsumercase.org/011607/2000/PX02991.pdf [3] MS/ECMA's OOXML Part 4, Section 5.1.2.1.14 ext (Extension) element definition: "This element [of type "CT_OfficeArtExtension" ] specifies an extension that is used for future extensions to the current version of DrawingML. This allows for the specifying of currently unknown elements in the future that will be used for later versions of generating applications. .. Attributes Description uri (Uniform Resource Identifier): Specifies the URI, or uniform resource identifier that represents the data stored under this tag. The URI is used to identify the correct 'server' that can process the contents of this tag. (http://www.ecma-international.org/news/TC45_current_work/Office%20Open%20XML%20Part%204%20-%20Markup%20Language%20Reference.pdf) Wow ! what an specification ! "the correct 'server' " ... come on !!!! MS, please honour the word "standard", stop gaming the system !!!!!Anonymous
July 08, 2007
The comment has been removedAnonymous
July 09, 2007
Stephane, For the umpteenth time, we are dealing with a file format spec, not an application spec. You just don't get it (or you do get it, and are just trying to spread FUD). Also, to be an international standard, a file format spec does not have to also document the application it was originally tied to. And by the way, since you have seen fit to throw some of your negative comments my way in the past, let me state I have no relationship whatsoever with Microsoft, and never have. For what it's worth, I have 45 years of software development experience, and 20 years of file format experience, starting with GML and SGML.Anonymous
July 09, 2007
The comment has been removedAnonymous
July 09, 2007
"a file format spec does not have to also document the application it was originally tied to." It's tied to it unfortunately. The specs is just a reflection of the implementation, not a general purpose model for Office documents. In fact, the specs came after the implementation. That's what makes it so ironic. Anyone who read it will notice plenty of typos, proving that this was NOT used to implement Office 2007 (it's the opposite). And it has tons of vendor-specific stuff in it. Something forbidden in international standards.Anonymous
July 09, 2007
"Bill's the General Manager of Platform Strategy at Microsoft. What got really interesting was when Yusseri raised the issue of OOXML and why didn't Microsoft just work on ODF in collaboration instead of creating a new, bloated standard. Bill's answer was quite surprising, as he clarified that the file format (OOXML) was a part of the software and that OOXML and the software (MS Office) are quite inseparable." http://www.openmalaysiablog.com/2007/03/day_1_microsoft.htmlAnonymous
July 09, 2007
Brian, I have to say it sounded to me like Stephane had some reasonable points hiding behind his unreasonable behaviour, so I'll try to put it in a more constructive way. In the following example: <w:instrText xml:space="preserve">TOC o "1-3" h z u</w:instrText> It sounds like you agree that "o", "h" etc. have a specific, defined meaning that an OOXML application should understand. If so, surely that's an example of non-XML metadata?- Andrew Sayers
Anonymous
July 09, 2007
The comment has been removedAnonymous
July 09, 2007
The comment has been removedAnonymous
July 09, 2007
That's fair enough, although to be honest I can see how developers could feel a bit cheated if they thought they were getting Office Open XML, but wound up with Office Open XML + Office Open Legacy Cruft. How about doing a post talking about all the little sublanguages that applications have to parse? That would let the impatient start work with realistic expectations of what they're getting into, rather than becoming alienated after investing a lot of time in a project.- Andrew
Anonymous
July 09, 2007
On rereading my post, it's probably not clear what I meant. I'm referring specifically to the syntaxes of different sublanguages, rather than meanings of individual metadata. So for example, the equivalent post for HTML would be: "To parse a web page, you need to know HTML [link the HTML spec], CSS [link to CSS spec], Javascript [link to JS spec], ..." The idea would be to present enough detail that developers don't walk in with naive expectations about how trivial it'll all be, and enough information that they can make good decisions about which libraries to use.- Andrew
Anonymous
July 09, 2007
The comment has been removedAnonymous
July 09, 2007
Stephane: I'm intrigued by the "at least 6 different ways to do text formatting" you have mentioned on a couple of occasions. What exactly are these? I can think of direct formatting, character styles, and paragraph styles--but that's only three, and all have very important uses. (I'm excluding weird methods, such as using comments for formatting or embedding postscript commands.)Anonymous
July 09, 2007
The comment has been removedAnonymous
July 09, 2007
Francis said "I'm intrigued by the "at least 6 different ways to do text formatting" Here are 4 for Excel files alone. This is not meant to be exhaustive, just what came out of my head.
- regular cell formatting <xf numFmtId="0" fontId="2" fillId="0" borderId="0" xfId="0" applyFont="1"/><font><sz val="11"/><color theme="6" tint="-0.249977111117893"/><name val="Calibri"/><family val="2"/><scheme val="minor"/></font>
- shared-string cell formatting (note that the shared-string is a technical artefact surfacing as everyone's problem now) <r><rPr><sz val="11"/><color rgb="FFFF0000"/><rFont val="Calibri"/><family val="2"/><scheme val="minor"/></rPr><t>ruir</t></r>
- cell formatting in a conditional alert (note: the conditional alert itself is declared elsewhere) <dxf><font><b/><i val="0"/></font><numFmt numFmtId="2" formatCode="0.00"/><fill><patternFill patternType="solid"><fgColor auto="1"/><bgColor rgb="FFFFFFFF"/></patternFill></fill></dxf>
- text formatting in charts <c:rich><a:bodyPr/><a:lstStyle/><a:p><a:pPr><a:defRPr/></a:pPr><a:r><a:rPr lang="en-US"/><a:t>t t</a:t></a:r><a:r><a:rPr lang="en-US" sz="1850" u="dash" baseline="0"><a:solidFill><a:schemeClr val="accent4"><a:lumMod val="60000"/><a:lumOff val="40000"/></a:schemeClr></a:solidFill><a:uFill><a:solidFill><a:schemeClr val="accent1"><a:lumMod val="60000"/><a:lumOff val="40000"/></a:schemeClr></a:solidFill></a:uFill></a:rPr><a:t>ruiry</a:t></a:r><a:r><a:rPr lang="en-US"/><a:t>t gfgfgfg</a:t></a:r></a:p></c:rich> This quickly adds up, and don't forget there are also styles from Word, Powerpoint and the shared libraries (VML, drawingML, ...) To say there are 6 is actually very conservative.
Anonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
I agree that the answers on the "fully documented" question don't seem to be very satisfying yet, but I see that as a different issue to the "fully XML" one. The comments I'm referring to about accepting you're right on XMLness are: "So, yes, field codes are an example of metadata which are represented in a non-XML way in most major word processors." (Monday, July 09, 2007 6:50 PM by Ian Easson) And: "Ian's response is pretty spot on." (Monday, July 09, 2007 8:49 PM by BrianJones) Moving on to fully-documented though, this is a more complex argument and I have to say I'm not sure that I've understood what you've been saying. Is it your position that OOXML should define every feature in enough detail for all OOXML-compliant applications to render that feature identically on-screen? If so, do you have an example for this like your excellent fully-XML example?- Andrew
Anonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
"It occured to me that what this blog may need is to be split into two" That was the essence of asking Brian to bring the developers in the fray.Anonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
Reading that example, it does strike me as at least poorly written. Firstly, "phClr" isn't a colour, it's more like a pointer to a colour. Secondly, referring to something not previously mentioned in the paragraph simply as "the style" requires people to guess which style is being referred to (And Murphy's law guarantees that people will gleefully pick the wrong style). It would be better written as: "phClr (Style Color): Used in a theme definition to indicate which colour the theme should use. This value indicates that the theme should inherit its color from the style in effect at the point where the theme occurs in the document." It sounds like it's not just poor writing style that you're complaining about though. Could you explain what information needs to be included and why you, as a developer, find it harder to do your job because it isn't included?- Andrew
Anonymous
July 10, 2007
Ian, Andrew "phClr" is not defined in the theme. If it were <schemeClr val="accent1">, which is a valid value, then it would make more sense since all themes include an "accent1" color (it's one of the basic colors a theme is based on). But "phClr" has no definition. And with the schemeClr being wrapped within a style that has no color, there is no way you can possibly infer a color. It's very typical of the problems in the specs : the implicit values.Anonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
Stephane, Andrew (and Brian), This example Stephane just gave is a good reason why there is a need for a technical OOXML blog. Developers and implementers can ask questions like: what style did you mean, how can I find out such-and-such, etc. Microsoft and ECMA can use the questions in such a blog to find out what parts of the current version of the spec need clarification or expansion or changes in a future version. Regards to all.. IanAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
Are the following statements equivalent: "phClr (Style Color) : A color used in theme definitions which means to use the color of the style." vs. phClr: "if a style color is present, use it."Anonymous
July 10, 2007
Yes. It looks like the author was trying to provide the additional context that it applies to theme definitions...Anonymous
July 10, 2007
Stephane, Have you considered setting up a blog? Posts like your last one are important and useful, but injecting them into a thread in the middle of a discussion just forces everyone to do needless context-switching. Brian, It sounds like you're saying that Microsoft has a tradition of document representation built on an entirely different philosophy to what you could call the DocBook tradition, and that this is something consistent through many (all?) Microsoft products. If so, what sort of documentation is available that explains the philosophy behind about the Microsoft tradition, as distinct from any one application of it?- Andrew
Anonymous
July 10, 2007
Andrew, It's not about Microsoft, but instead the Word application on which the WordprocessingML format was based. DocBook was based on applications that were more for book authoring, etc. (chapters, sections, etc... basically a lot of structure and heirarchy). The ODF format was based on StarOffice Writer, and is somewhere in between the DocBook approach and the WordprocessingML approach (probably most similar to HTML). I'll pull together a post tomorrow that goes into this further. -BrianAnonymous
July 10, 2007
That sounds really useful. I suggest you make a point about how this is a different, but equally valid, philosophy of what XML is about. It seems to me that this issue explains why so many people complain about OOXML being a collection of angle brackets that doesn't follow the deeper design of XML: their concept of the "deeper design of XML" is actually "the DocBook tradition of XML", and since it's never occurred to them that another tradition could exist, all they can see is really badly implemented DocBook.- Andrew
Anonymous
July 10, 2007
"It seems to me that this issue explains why so many people complain about OOXML being a collection of angle brackets that doesn't follow the deeper design of XML: their concept of the 'deeper design of XML' is actually "the DocBook tradition of XML" IMHO, Andrew, their ( and my ) concept of the 'deeper design of XML' is to keep low the barrier to understand things ( this is one of the goals of XML ) and apply "common sense" to represent document structure. After all, this is not rocket science, or is it? ( rocket science is the work that @stephane and others must do to decipher the binary and OOXML formats to get a decent implementation ) If you develop a document format for your own benefit ( or minor partners that achieve partial implementations ) you get one "kind" of format ( i.e: OOXML ). If you develop a format in everyone's benefit you get other kind of formats ( HTML, ISO-ODF, etc.). Examples: compare this [1] with this [2] experience of a qualified[3] expert in XML [1] <a href="http://www.xml.com/pub/a/2004/02/04/tr-xml.html">http://www.xml.com/pub/a/2004/02/04/tr-xml.html</a> [2] <a href="http://www.snee.com/bobdc.blog/2007/05/word_2003s_awful_xml_for_index.html">http://www.snee.com/bobdc.blog/2007/05/word_2003s_awful_xml_for_index.html</a> [3] <a href="http://www.snee.com/bob/xmlsgml.html">http://www.snee.com/bob/xmlsgml.html</a>Anonymous
July 10, 2007
( re-posting URLs ) [1] http://www.xml.com/pub/a/2004/02/04/tr-xml.html [2] http://www.snee.com/bobdc.blog/2007/05/word_2003s_awful_xml_for_index.html [3] http://www.snee.com/bob/xmlsgml.htmlAnonymous
July 10, 2007
Hey marc, thanks for the links. Not sure they are super productive in moving the conversation forward, but I appreciate them nonetheless... -BrianAnonymous
July 10, 2007
The comment has been removedAnonymous
July 10, 2007
Andrew said "It seems to me that this issue explains why so many people complain about OOXML being a collection of angle brackets that doesn't follow the deeper design of XML: their concept of the "deeper design of XML" is actually "the DocBook tradition of XML", and since it's never occurred to them that another tradition could exist, all they can see is really badly implemented DocBook." It's the other way around. When you take a look at Excel's file format until now, this was a serialization (BIFF) that went against human nature. To give you an idea of the magnitude of how bad it is, you have to write 10 KLOCs instead of 1 KLOC for just about everything. Moving to so-called XML was the opportunity for Microsoft to fix it, while still preserving the features. That would have certainly been hard work, but it was WORTH it, that was what would have made Microsoft the company that they think they are. Instead, they are only preserving their cash cow. I find "open" and "xml" not very meaningful when it comes to OOXML. That's what makes the whole move ridiculous in the first place.Anonymous
July 11, 2007
Stephane, Marc, You talked about "common sense" and "human nature". If you're talking there about things like declaring metadata before the data it applies to, then that's what I'm referring to by "the DocBook tradition", and it's something that seems to be as unnatural to the Word team as the Word tradition is to you. You can argue that the Word tradition is inefficient, or that traditions have a network effect, or that Microsoft should have built up the level of knowledge in the community long ago, but you need to agree on definitions of terms first, or people in the other camp won't understand what you're saying.- Andrew
Anonymous
July 11, 2007
Then, why are they going ISO with their "tradition" ? This "tradition" allows Microsoft to push vendor-specific stuff such as dates which in turn destroy interoperability. ISO mandates include interoperability. Isn't there a contradiction? It's dishonest not to put "Microsoft" or something that makes it clear that it's a vendor-specific stuff being pushed. It cannot be called "OfficeOpen XML", it cannot NOT include Microsoft in the title. To say otherwise is hypocrisy.Anonymous
July 11, 2007
The comment has been removedAnonymous
July 11, 2007
I assume the "dates" issue you're talking about is the 1900/1904 stuff. I've been trying to avoid arguments based on evidence rather than logic, but in this case I'll make an exception because I think it reveals a more fundamental issue. The nearest I've heard to an explanation of why OOXML uses such an odd date system comes from an aside in a Joel on Software article ( http://www.joelonsoftware.com/items/2006/06/16.html ) and a (strong, IMHO) criticism in the blog of a KDE developer ( http://www.kdedevelopers.org/node/2834 ). In short, they grandfathered in a bug in Lotus 123 back in the stone age, and feel that backwards compatibility is more important than forwards compatibility on this one. I think this reflects another philosophical difference between the ODF and OOXML communities: OOXML proponents feel that "interoperability" means "backwards compatibility first, and forwards compatibility if and only if it can be done without breaking backwards compatibility", whereas ODF proponents want backwards compatibility if and only if it can be done without making life harder in the future. On the question of requiring OOXML to be relabeled "MS OOXML", that's an intersting idea, but I disagree. Microsoft claim to be trying to change their business model from that of the standard-bearer for the trade secrets approach into a more community-friendly model. You may say that they're just the same old wolf trying on sheep's clothing, but personally I prefer to give them the benefit of the doubt where I can do so without putting myself in danger of getting eaten. As with anyone trying to make a change in life, it's best to praise loudly Microsoft's good behaviour and not tie them too strongly to past misdeeds. Sending OOXML to ISO is a really big step for them, and they deserve credit for trying something new. If you feel that it's not enough, by all means say so, but do it in a way that emphasises what they have to gain by doing it your way in future.- Andrew
Anonymous
July 11, 2007
The comment has been removedAnonymous
July 11, 2007
The comment has been removedAnonymous
July 11, 2007
The comment has been removedAnonymous
July 12, 2007
The comment has been removedAnonymous
July 12, 2007
Yes, Microsoft has to learn how to understand a wider range of views, but everyone else has to learn how to speak in a language they can understand too. For example, appeal to authority is a weak argument at best, and only works at all if you've both previously established respect for that authority. When Brian says "it's alright for OOXML to do such-and-such, because ODF does the same thing", he's saying that because he's talking to people with a previously-stated respect for ODF. If you can find Rick Jelliffe or Doug Mahugh claiming distaste for OOXML, I should imagine Brian would be much more receptive. I read the articles you linked to, and to be honest I'm not sure what point you were trying to make. I guessed before that you were trying to say that traditions have a network effect (XSLT and similar tools require DocBook tradition XML), but I could be wrong about that. Implementing an argument really is like implementing an algorithm: you take certain inputs ("premises"), perform a series of actions on them ("deductions") and produce an output ("conclusion"). If your compiler ("Brian Jones' brain") spits out an error message, you need to fix the bug, not order your compiler to fix it for you.- Andrew
Anonymous
July 12, 2007
Marc, Again I'm happy you posted the links in terms of spreading more knowledge. I just wasn't sure how to respond to them in terms of this dicussion. I believe I actually linked to Bob's post about "word's awful XML" a few weeks ago and had some comments on it. I'm sorry if I implied that I don't appreciate people sharing information... that wasn't what I meant. Stephane, I'm sorry I didn't reply to your second to last comment. There were some good pieces in there I want to address. I'm going to think of a way to write a post that goes into those things. The short of it though is that I've never tried to imply that we created the ultimate XML file format. We created an XML file format that could meet a specific need (backwards compat). Rather than keeping our formats closed and proprietary we fully documented the format and then gave it away. I would have loved nothing better than to have started from scratch and designed the ultimate document formats but that's just not realistic given the fact that 99.9% of our customers that use the product just want things to work. Andrew Thank you for helping to facilitate the discussion. :-) I try not to get too defensive, but I'm sure that doesn't always come through. -BrianAnonymous
July 12, 2007
Quote of the day: "I am sure you realize I sell and support two extremely advanced products related to those formats. How could I hate it?" Stephane Rodriguez, OOXML lover.Anonymous
July 12, 2007
Fernando, I'm sure you are being ironic. Care to explain? If I hated this stuff, I wouldn't spend so much time with it, and I would not be in this business. So there is no irony here.Anonymous
July 12, 2007
I think Fernando was just making a joke. You have to admit Stephane that you tend to have a certain tone to your posts. :-) -Brian