Sdílet prostřednictvím


A few updates on the OpenXML formats

Sorry I've been offline for the past couple weeks. I've been meaning to post some content for awhile but I've been swamped with Office 14 planning. I was trying to stay up on the comments from my last post, but still ended up slacking there as well (I'm sorry I didn't reply to everyone's comments). I've also been receiving a bunch of e-mail lately from folks, and I wanted to apologize for not replying yet. Hopefully I'll get some free time soon.

I just realized that I never finished my "Intro to SpreadsheetML" posts, so I'll try to get something up early next week to close that off. I also haven't posted many examples of PresentationML so hopefully I'll be able to get to that as well. Let me know if anyone has some examples they'd like to see in order to help them get started with Open XML development.

National Bodies comments on DIS 29500 online

If you're looking for some good weekend reading though, you should check out the official comments submitted by the various National Bodies during the DIS 29500 contradictory review: https://jtc1sc32.org/doc/recent/JTC001-N-8530.zip (if you're looking for some fun conspiracies, check out the metadata of J1N850-12 and J1N850-13 and see who the authors were).

J1N850-22 is the official Ecma response to the comments, which was posted a few weeks ago. This is the first time the official comments from the various national bodies have been posted. You can see that there are a number of shared themes in the documents, which was why the Ecma response tried to group many of the comments rather than replying to each one individually.

James Governor takes a closer look at archival formats

OpenXML vs ODF: does the archiving argument stack up?

"The industry needs to move beyond good vs evil, manichaen black vs white, beyond the single answer to a problem. Our monoetheism does us no favours ... One true format? What do we need that for and what god are we worshipping? What are the problems we're trying to solve?"

I think as we see more and more applications pop up that support OpenXML (besides those built by Microsoft), you'll start to see the anti-OpenXML folks calm down a bit. The ideal in any archival format is that it allows for long term access with as little disturbance as possible. That was the whole point of OpenXML. Give the world a fully documented format and pass the ownership of that format over to a standards body for safe keeping and future development. OpenXML allows anyone to build tools that read and write the formats, and at the same time is designed to cause the least amount of disruption possible. You can move all your existing documents into OpenXML, and you won't lose a thing. J

Open XML workshop in Sweden

https://blogs.infosupport.com/wouterv/archive/2007/04/19/Office-Open-XML-Workshop-in-Sweden.aspx

There have been a number of workshops going on around the world to help educate developers on how to build solutions leveraging the OpenXML formats and I'm pretty excited to see the types of solutions they build.

You can follow Doug's blog if you'd like to find out more about the workshops: https://blogs.msdn.com/dmahugh/default.aspx

Have a great weekend.

-Brian

Comments

  • Anonymous
    April 21, 2007
    The problem with posting pro-Microsoft only links is that you post only once a week. There is a truth you can't deny even with your typical spin. And this truth speaks for itself in this astounding silence.

  • Anonymous
    April 22, 2007
    I think for most folks out there, silence on one's blog is more of an indicator that they are too busy. I would love to post more often, but haven't had a chance to. If there is a truth out there that I'm denying though, please educate me... -Brian

  • Anonymous
    April 24, 2007
    Does Microsoft have any plans for Word or other products to be capable of single-source publishing? I am thinking of something like AuthorIt that can be used to author in chunks, and published to multiple formats at once (PDF, DOC, etc).

  • Anonymous
    April 24, 2007
    The new formats in combination with the custom schema support are a great first step in this direction. It's definitely a scenario we've thought heavily about. Once you get the files out as .docx, and you've marked up the structure with custom XML, it becomes much easier to transform not just into other formats, but into different layouts and presentation schemes. That's actually how we built the Ecma spec. We had a single database with about 10,000 rows for each piece of the schema we were documenting. Each row would contain the name of the schema element, a unique ID, and a chuck of WordprocessingML for the rich content. We could then easily generate on demand granular sections of the spec to review, edit, and comment on. Any of those sections we would generate were marked up with custom schema though, so we could quickly re-shred that section back into the master database. This allowed us to easily manage the 6,000 pages of content, since at any point we were only really editing around 100-200 pages. It also allowed us to work on multiple sections in parallel so that members of the TC could focus on the areas that were of most interest to them. The other great piece of this was that we were only storing the rich text description for each piece of the schema seperately from the template. So we could quickly decide that we wanted to present the information in a different way (different heading styles, ordering, fonts, colors, etc.) and we would only need to change the master template. Then the next time we generated the full spec, it would reflect that new look. -Brian

  • Anonymous
    April 24, 2007
    I guess what I'm looking for is sort of a re-engineered Word that allows me to write in chunks and recombine them in end formats without knowing anything about DITA or XML.

  • Anonymous
    April 24, 2007
    The comment has been removed

  • Anonymous
    April 24, 2007
    Neat solution! The automatically-generated bookmarks in the spec led me to suspect that you were storing the attributes in a database. Obviously, the benefits were much greater than the cost (of setting up the DB and custom schema and gluing it all together with code.) The same probably can't be said for more moderate collaborative projects, though (e.g. only a couple hundred pages in length.) Have you given any thought to non-DB solutions? E.G. directories full of marked-up files, all pulled together with an enhanced (XML-aware and reliable) master/subdocument feature? This might not scale as well as DB but would be much less costly to set up (and thus beneficial sooner.)

  • Anonymous
    April 24, 2007
    Thanks for the feedback, it is helpful. Please put in my 2 cents that this type of functionality in Word would be killer in the buiness world going forward!

  • Anonymous
    April 24, 2007
    Francis, We've definitely looked at solutions like you describe for general document assembly scenarios. In fact we had an intern project last summer where they played around with using Sharepoint as a document assembly repository. Basically you could put a bunch of documents up on sharepoint in a doc library. There was then a master document that would essentially rebuild itself based on the documents in the library (and the metadata on each document). It's a pretty cool prototype, and at some point here we may clean it up a bit and post it as example code for folks to play around with. In it's current form it only works with the earlier Betas though, so either way it would need some more work before it could be used. JWilhelm, I'll definitely take your feedback into account. Thanks! -Brian

  • Anonymous
    April 24, 2007
    BTW, I would love to try your SharePoint example if it is possible. Any links or info you have would be great. I'm at jwilhelm@athenati.com Also, I suppose that master and sub docs can in some way function as reusable content - correct?

  • Anonymous
    April 24, 2007
    Unfortunately I'm not sure when we'd be able to pull the Sharepoint solution together into a useful form, but I'll take a look. It may be something we can get someone to write up an article and post the source (I'll look into it). You're right that master sub docs solves a similar scenario. I've been trying to explore ways we can do it in the cloud though, rather than the heavy client side dependency that currently exists with the feature. -Brian

  • Anonymous
    April 24, 2007
    >(if you're looking for some fun conspiracies, check out the metadata of J1N850-12 and J1N850-13 and see who the authors were) The fact that IBM undeniably wrote the Kenya's NSB response makes Rob Weir's latest rant(http://www.robweir.com/blog/2007/04/sometimes-i-need-to-remind-myself.html) looks even more cynical.

  • Anonymous
    April 25, 2007
    Please show examples of how to unzip the docx files in a PowerShell script or a Visual Basic program.  I know that I can manually change the docx file extension to zip and double click on the zip file, but I want to get at the XML files without going through the extra step of manually unzipping the docx files.  Does Visual Studio provide methods for unzipping the docx file?

  • Anonymous
    April 25, 2007
    Fernando, I agree it's a real shame to see those types of tactics used and unfortunately, this isn't an isolated case. Oh well, hopefully in the end more level heads will prevail. :-)


Hydrogen, Check out this tool (it's pretty slick): http://blogs.msdn.com/brian_jones/archive/2007/04/03/visual-tool-for-developers-working-with-the-open-xml-formats.aspx -Brian

  • Anonymous
    April 27, 2007
    @Fernando The names in the Kenyan responses look German and Korean. How would those be seen as IBM related ?

  • Anonymous
    April 27, 2007
    The comment has been removed

  • Anonymous
    April 28, 2007
    @Jeffrey Google the German name...

  • Anonymous
    April 29, 2007
    Wesley: Perhaps the macro language is not included in the spec for the document format because the macro language really CANNOT be specified.  It is just a bunch of COM Automation interfaces onto the codebase of Office.  People complain that OOXML is too big to be implemented, but to do Macros, you'd have to basically reimplement Office.  This is not a winning scenario for anyone.   Macros are not portable by a longshot.  They don't even seem to be totally consistent between Mac and Windows versions of Office.  "Active" documents (which use Macros) are a relative rarity in most cases anyway.  Transitioning to ODF or to any other office suite will necessarily require the rewriting or retesting of these active documents which are so important to the business flow.  Hopefully, with the specification of OOXML, some of the tooling that's built in VBA will be moved to external apps that modify the XML directly. I think the ODF's tactics on this issue are quite despicable.  I can't see what they're trying to do though the words of Rob Weir and Sam Hiser other than riling up the easily-enchanted OSS sheep.  Look, I like OSS and the philosophy of sharing as much as the next guy, but I will say that there is a large body of people who love it to the point of intellectual dishonesty.  That's Sam Hiser.

  • Anonymous
    April 30, 2007
    nksingh, I can see your point.  It's something that nobody seems to have considered, though, from the earlier DOS, and Mac (and other OSes). As far as I can see, however, there are two things we're talking about when we talk about Office Suite macros.  One is the recorded keystrokes thing, when the application itself interprets a set of keystrokes and records them for replay later; the second is when you use a more formal specialist programming language - VBA IIRC is the MS Office one (I'm not a regular MS Office user - I need something crossplatform). That's what makes Macros so messy - which one is the one we're talking about?  I'm talking about the programming language - and that can be specified, reasonably accurately.  Microsoft has done it with C#, so doing the same thing for VBA should not be too difficult.  In effect, they could even get it ported to Linux - there's several Basics that run quite happily on linux, some of which are Open Source, and the maintainers would blink perhaps at receiving patches from @microsoft.com addresses, but would weigh them on their merits. And you mention COM Automation interfaces.  I think I raised a question like this in relation to ActiveX a few months ago (I help maintain a community centre's Community-based Technology and Learning Centre, and ActiveX has proved to be a source of grief.), where I argued that its functionality should be abstracted, so as to be applicable to anything Unix as well.  If Microsoft hasn't done this in relation to ECMA 376, it's not my problem, but it is a problem, and the less attention is paid to it, the bigger it will get. As far as reimplementing MS Office goes, one might argue that the process of reverse-engineering the MS Office file formats, is the first step on the path to doing precisely that.  Nobody's bothered to go any further - so I guess it's not that enthralling a project when you can write your own Office Suite and show the world how it really should be done. ;)

  • Anonymous
    May 03, 2007
    The comment has been removed

  • Anonymous
    May 17, 2007
    I admit that I wrote the Kenya paper.

  • Anonymous
    May 17, 2007
    Bob's Avatar: I'm upset that you are taking credit for the Kenya ISO "standards" paper.  I am the original IBM composite fictional character and I deserve the credit for sneaking around and trying to deceive the international standards community more than you.    

  • Anonymous
    June 24, 2007
    Neznam da li pratite ovu sapunicu oko ratifikacije / standardizacije Open XML (OOXML kako ga popularno