Condividi tramite


New default XML formats in the next version of Office

I’m Brian Jones, a program manager on the Word team. I’ve been at Microsoft for about 6 years, and have been working on XML support in Word and across Office for a good percentage of that time. I thought I’d set up this blog to talk with people about what we’re doing in the next version of Office around XML. When we first started talking about Office 2003 and the features we were going to provide around XML, there were a lot of misinterpretations. It was frustrating not having an easy way to answer questions, provide insight, and clear up any misunderstanding. I didn’t want to make the same mistake again, so I told everyone that I wanted to start blogging as soon as we announced the new "Microsoft Office Open XML Formats" (still getting used to the official name). The PR folks said they thought it would be ok, and they even decided to post some links to this site from the different marketing materials being released which is pretty cool.

I’ve been waiting a long time for this day, and it’s awesome that I’m able to talk about this so early in the product cycle. I made a post last week talking about Office 2003 XML, but that was just more of a test to see how this whole blog thing works. The real reason for setting up this blog was to talk about the new default XML formats in the next version of Office (although I’m sure I’ll spend a good amount of time talking about 2003 as well).

I’m hoping that people will have tons of comments and questions because I’m eager to spend time discussing this topic (I already do with the people I work with so why not branch out a bit). I’d like to find out what kinds of questions people have, and what kind of additional information or tools you’d like to see. The whole point of these new formats is for them to be open to anyone to work with, so I want to make sure we make it as easy as possible.

If you haven’t already read the press release, it’s probably worthwhile since it gives a good overview of everything that’s happening. It is a press release though, so you’ll have to deal with it coming more from a marketing angle. You should be able to find it up on the presspass site: https://www.microsoft.com/presspass

I didn’t want to make this first post too long, but I do want to go into some of the things I think are the most important to understand about these new formats. I’ll definitely spend more time in future posts digging deeper on these different topics, as well as going into the goals behind the formats.


Open XML Formats Overview

To summarize really quickly what’s going on, there will be new XML formats for Word, Excel, and PowerPoint in the next version of Office, and they will be the default for each. Without getting too technical, here are some basic points I think are important:

  1. Open Format: These formats use XML and ZIP, and they will be fully documented. Anyone will be able to get the full specs on the formats and there will be a royalty free license for anyone that wants to work with the files.
  2. Compressed: Files saved in these new XML formats are less than 50% the size of the equivalent file saved in the binary formats. This is because we take all of the XML parts that make up any given file, and then we ZIP them. We chose ZIP because it’s already widely in use today and we wanted these files to be easy to work with. (ZIP is a great container format. Of course I’m not the only one who thinks so… a number of other applications also use ZIP for their files too.)
  3. Robust: Between the usage of XML, ZIP, and good documentation the files get a lot more robust. By compartmentalizing our files into multiple parts within the ZIP, it becomes a lot less likely that an entire file will be corrupted (instead of just individual parts). The files are also a lot easier to work with, so it’s less likely that people working on the files outside of Office will cause corruptions.
  4. Backward compatible: There will be updates to Office 2000, XP, and 2003 that will allow those versions to read and write this new format. You don’t have to use the new version of Office to take advantage of these formats. (I think this is really cool. I was a big proponent of doing this work)
  5. Binary Format support: You can still use the current binary formats with the new version of Office. In fact, people can easily change to use the binary formats as the default if that’s what they’d rather do.
  6. New Extensions: The new formats will use new extensions (.docx, .pptx, .xlsx) so you can tell what format the files you are dealing with are, but to the average end user they’ll still just behave like any other Office file. Double click & it opens in the right application.

I’ll definitely go into a lot more detail on these different points in future posts. Just to summarize though, I’m really happy with these new formats so far. Microsoft will build a lot of functionality around these formats for years to come, but I also hope other people outside of Microsoft will take advantage of them, since anyone that wants to can. You can look inside the files, make modifications, generate new files, add content, remove content, or any other number of things that people would want to do with an Office file.

If you want some more information in a more official form, there are two whitepapers available. Here’s a brief overview of each one:


Whitepapers

**

The Microsoft Office Open XML Formats: New File Formats for "Office 12"

https://download.microsoft.com/download/c/2/9/c2935f83-1a10-4e4a-a137-c1db829637f5/Office12NewFileFormatsWP.doc

This first whitepaper is a general overview of the file format, and is targeted at multiple audiences. It starts off with an introduction about what’s going on and also briefly touches on the history of the current binary formats and how we got to where we are today.

**

The Microsoft Office Open XML Formats: Preview for Developers

https://download.microsoft.com/download/c/2/9/c2935f83-1a10-4e4a-a137-c1db829637f5/Office12FileFormatDevPreviewWP.doc

This paper talks more about the architecture of the formats and is targeted at developers. This paper has a similar introduction to the first (but from a slightly different angle). The last 7 or so pages of the paper go into solutions and what people can do with these files. It’s a great way to start thinking about the possibilities, and what types of things you can probably expect to see built on top of the format.

 

OK, that’s enough for now. Sorry this was such a long post, but I didn’t have time to make it shorter (I think that was Twain or Pascal?). I’m going to get some sleep, and then see what things people are curious to know more about. Talk to you all tomorrow.

-Brian

Comments

  • Anonymous
    June 01, 2005
    Kind of ironic... just as Avalon comes along and makes using structured storage (the basis of Office file formats) the standard, Office moves away from it...
  • Anonymous
    June 01, 2005
    Brian, awesome stuff!

    Check out the video of you over on Channel 9:

    http://channel9.msdn.com/ShowPost.aspx?PostID=73329
  • Anonymous
    June 01, 2005
    This is definately big news. Personally I am an Office 2004 user, but I will be very glad when I can have document portability between word processors.
  • Anonymous
    June 01, 2005
    Brian, This is great news!

    Heck, I'm just happy that PowerPoint is going to have an XML file format, let alone that these formats will be backward compatible! These are great new additions for us Office devs. It's like Christmas :^)
  • Anonymous
    June 01, 2005
    This is cool for interoperability, but we need Open Document Support as well. Will we be able to save this format or is docx compatible?
  • Anonymous
    June 01, 2005
    Open Office already stores files in XML format rather than a proprietary format like Microsoft. I'd be interested to know how this new XML solution Microsoft is adopting compares (or betters) the one available in Open Office - is Microsoft just playing catch up (at least in this area)?
  • Anonymous
    June 01, 2005
    See also:<br>
    <a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office">OASIS OpenDocument XML Format</a><br>
    <a href="http://europa.eu.int/idabc/en/document/3439">Recommendations by the EU</a><br>
  • Anonymous
    June 01, 2005
    See also:
    http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
    http://europa.eu.int/idabc/en/document/3439
  • Anonymous
    June 01, 2005
    See also:
    http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
    http://europa.eu.int/idabc/en/document/3439
  • Anonymous
    June 01, 2005
    Word, Excel, Powerpoint. Why not Publisher? Onenote?
    I also know that with the previous open XML formats in Office 2003 there were some issues with patents and so no other program had implemented reading these formats. Given that the XML formats in 2003 had actually failed to be adopted by other application, how are you addresssing this issue in the new Office.
  • Anonymous
    June 01, 2005
    When y'all say "open" do you mean that it won't encode COM objects or any other kind of object in the XML as a binary format that is not documented?
  • Anonymous
    June 01, 2005
    In the 2003 XML documents, while the core schemas were more-or-less documented, auxilary ones such as VML did not fare nearly as well. Hopefully the documentation for these new formats will address all of the constituent schemas?

    Also, I would like to know exactly how deeply the XML-ization is being applied. For example, will MSGraph and Equation Editor objects also be represented as XML, or continue to exist as undocumented binary blobs?

    Thanks,
    Chris
  • Anonymous
    June 01, 2005
    Microsoft has announced that the next version of Microsoft Office will use, BY DEFAULT, new OPEN XML...
  • Anonymous
    June 01, 2005
    The comment has been removed
  • Anonymous
    June 01, 2005
    The comment has been removed
  • Anonymous
    June 01, 2005
    The comment has been removed
  • Anonymous
    June 01, 2005
    XML is open. XSLT is well-defined. ZIP is standard.

    I hope Microsoft isn't stupid enough to patent the file formats.
  • Anonymous
    June 01, 2005
    Hi there

    Great news. For some time ago I read up on the 2003 schemas and I found them great..for simple documents... There were problems with embedded objects, images, VBA code...Will these issues be fixed in the new version.

    I am thinking of embedding images (and other content), embedding VBA code (event handlers), etc.
    Main goal is to write a server-side "report" using the desired schema and stream it back to the desired application.

    Regards

    Henrik
  • Anonymous
    June 01, 2005
    "Sorry this was such a long post, but I didn’t have time to make it shorter (I think that was Twain or Pascal?)"

    That was Pascal.
  • Anonymous
    June 01, 2005
    Big news from the Office team yesterday! It looks as if XML is going across the board in Office 12. So much so that they are changing the file extensions to .docx, .pptx, .xlsx. WOW! This is major and very...
  • Anonymous
    June 01, 2005
    I think we all have been expecting this for a while since Microsoft introduced the XML format in Office 2003. Now they're going full on and going to use XML (with zip compression) as their default Office format. It's certainly...
  • Anonymous
    June 01, 2005
    Hi,

    Sounds like good news, but how will it be supported in previous Office version? I'd really like to be able to generate excel or word document from my favourite webapp... but I need them to be readable by lot of people... (even inside an organization, we're sometimes stuck with 97 version!). Any comment on this? Will we be forced to configure our new Office 200? to serialize doc as old format, to be sure that the doc are still readable?

    Any comment on this?
  • Anonymous
    June 01, 2005
    Brian Jones, Program Manager for MS Word team announced the new file formats in the next Office System...
  • Anonymous
    June 02, 2005
    Good news, and (trusting that the file formats are as easy and open as you indicate - haven't seen them yet) I'm really looking forward to it!

    When are you going to apply this to Visio. Next week, right?
  • Anonymous
    June 02, 2005
    Also, I haven't been able to find a complete sample. I know the schemas aren't settled yet and so will be changing (perhaps substantially), but can we get one complete example to look at? I want to see just how much of an improvement over WordML you've made.
  • Anonymous
    June 02, 2005
    Wow, thanks for the comments. I'll try to get to each question as soon as I can.

    Cori - Currently the plan is just for Word, Excel, and PPT to take this approach with the ZIP container. As you know though, Visio has had an XML format for a couple releases now. It's a single XML file.
    I'll see what I can do about getting an example file up. I'll definitely show more at tech-ed next week if you're going to that. I was wondering if you could give some more details on what kinds of improvements you'd like to see made on the existing WordML implementation. Are there any specifics you could give on what you find difficult?
  • Anonymous
    June 02, 2005
    First, the good news. As many of you have seen by now, at 12:01 AM ET last night, Microsoft announced rather extensive - and pretty much open - support for XML based formats for its next gen Office iteration, due...
  • Anonymous
    June 02, 2005
    Is there anything about the licensing terms that would preclude GPL software from making full use of documents in the new format?
  • Anonymous
    June 02, 2005
    You mention compatibility with Office 2000, XP, 2003... What about the Windows Mobile versions? Currently, PocketWord and PocketExcel can read .doc and .xls files (though I think they are translated by ActiveSync). Any idea how these new formats will work with the Pocket* applications?
  • Anonymous
    June 02, 2005
    Bah! Most of the posts here are about how this new XML format will open the MS Office formats.

    This is wrong.

    XML is composed of two pieces: XML, which is open; and an XML Schema, which will be closed and propietary. An XML document without an XML Schema is not usable by other applications.

    So don't go around pretending that Microsoft's "XML" will let people open Word documents with other applications. It won't. They are still maintaining their closed formats and will continue to lock-in the customers.
  • Anonymous
    June 02, 2005
    EnronHaliburton2004 - You're right that the schemas are very important for being able to interpret the XML. You can get the schemas from the Office 2003 XML up here: http://www.microsoft.com/downloads/details.aspx?FamilyId=FE118952-3547-420A-A412-00A2662442D9&displaylang=en

    The schemas are fully documented so you can read through and find out for yourself how to interpret the XML. In addition there is a royalty-free license that is available which means that you don't have to pay anything to Microsoft for it's use.
    We'll of course be doing the same thing with these new formats, although they are still under developement so we don't have the schemas ready yet. I really want to go a lot further than we did in 2003 and provide some really good best practices documentation as well as tons of examples.
  • Anonymous
    June 02, 2005
    Skeptic - In response to the post a bit further up, I wanted to be a little more clear on why I said the files are more robust.
    While I’ll agree that ZIP isn’t as robust as text, it’s still extremely robust relatively speaking (compared to compound doc for instance). A single bit corruption will usually result in the one part that was corrupted being unreadable. So, in order to make the files robust, we break them up out into multiple pieces (files) within the ZIP. For example, with a powerpoint file, each slide is a separate XML file inside the ZIP. This is done both for robustness as well as making it easier for people (developers) to work with the files.
    There are a number of different corruptions that can occurred. Some are from user error, while others have to do more with faulty harddrives, transmission errors, etc (essentially bit rot). Let’s just talk about the bit rot example for now. While there is a central directory at the end of the entire ZIP package, that central directory isn’t required to open the ZIP. The files inside the ZIP are all written out serially, and there is a header before each file. So, even without the central directory, we could still rebuild it just by scanning the file looking for each header. In addition, each file is compressed separately. That means that if one file gets corrupted, you can still open all the other files just fine. I can go into much more detail if you guys are interested in this topic.
  • Anonymous
    June 02, 2005
    You publish the spec for these formats and they are based on XML. That's good. You also promise to give out royalty-free licenses. That's good, too.

    But that doesn't make those formats "open". The very fact that people need to obtain royalty-free licenses from Microsoft demonstrates that the formats are still proprietary (i.e., owned by Microsoft). If Microsoft didn't own them, people didn't have to obtain a license from Microsoft in order to implement them.

    So, when you say that the formats are "open", you are misleading people. The formats are not "open", they are proprietary (albeit documented).
  • Anonymous
    June 02, 2005
    Joshua - I've talked with the team that makes the PocketWord and PocketExcel applications and they are looking at these new formats. I can't really comment for them though on what support they are planning to provide (sorry), but I know they were excited about the fact that the formats were XML.
  • Anonymous
    June 02, 2005
    "The very fact that people need to obtain royalty-free licenses from Microsoft demonstrates that the formats are still proprietary (i.e., owned by Microsoft). If Microsoft didn't own them, people didn't have to obtain a license from Microsoft in order to implement them."

    Actually, most software and standards that are labelled "open" have royalty-free licenses. Look at the legal verbiage that standards bodies use and you will see that there is implicit ownership and a license agreement for all of their deliverables.

    Partially because of the loud voice of the open source community, Microsoft has now chosen to open up a portion of its intellectual property. Instead of complaining that it's not enough, we should encourage the folks in Redmond.

    Who knows... with enough positive feedback, they might do this more often :-)
  • Anonymous
    June 02, 2005
    The comment has been removed
  • Anonymous
    June 02, 2005
    The comment has been removed
  • Anonymous
    June 02, 2005
    Robert - That's actually a great use case. If you play around with the Word 2003 OM, you'll see you can already do something similar today. There is an xml property off the range object. You can selected the entire file as a range (or just do document.xml I believe) and request the XML for that file. You can then muck with it all you want, and when you are done, use the insertXML method (you'd probably want to select the whole file again to overwrite what's already there).

    To everyone else I haven't replied to yet, sorry. I'm keeping a list of the questions and will try my best to address them. It's been a really busy day.
  • Anonymous
    June 02, 2005
    Brian,

    The Range.xml (or Document.xml) feature sounds great. I haven't done much with XML for Office 2003 because my understanding is that it isn't 100% compatible for roundtripping purposes (preserving all formatting, etc.).

    Do you know whether the Range.XML feature will be updated with Office 12, so the XML it emits and accepts will be 100% compatible with the new file format? It would be unfortunate if it continued to use the current format and schema.
  • Anonymous
    June 02, 2005
    First time I read this I started laughing. After all, it is well past April Fools' Day. And Microsoft can't claim this at least to be any sort of innovation - I've been using OpenOffice.org for at least four years now, and what has been described and defined as the docx, etc, file formats, read like what I already know about the sxw, etc, file formats, albeit with a different name.

    It is that that I want to ask about - OpenOffice.org's file formats are quite robust, and are rapidly turning into the standard. Will Microsoft Office follow that already market-driven standard? Now that OpenOffice.org has been accepted in such large markets as Brazil and China?

    Because if Microsoft fails to follow this market standard and diverges by even a small amount in any undisclosed manner, that cannot be accounted for by technical reasons any sane technician could accept, I fail to see how my current employer, a non-profit community organization, can afford to use any future Microsoft Office.

    Thanks
  • Anonymous
    June 02, 2005
    Another question--now that the document summary information is presumably stored as an XML stream inside the zip, instead of as a Structured Storage stream, won't this effectively hide the info from current versions of Windows? -Chris
  • Anonymous
    June 02, 2005
    How does this affect current solutions based on the Office 2003 WordML format?
  • Anonymous
    June 02, 2005
    Mixed response - in one way a good thing, but then MS sour the milk by not reusing or even extending an existing standard, I suppose if they did then they couldn't claim "innovation".

    It is a real shame thet the OASIS file format was not employed, if it was deficient, then I am sure MS could of worked with the community and added extensions etc. but MS has never really done that has it?

    Please don't bander the term "open" - using it in this context is misleading, "published" would be a better description. MS owning the standard allows them to change it and break 3rd party compatability on a whim.

    If you want to use "open" and really be seen as "open" by any user with a bit of savvy then hand ownership of the schema to either the community or get ISO recognition.

    All in all, a missed opportunity.
  • Anonymous
    June 02, 2005
    The comment has been removed
  • Anonymous
    June 02, 2005
    The comment has been removed
  • Anonymous
    June 02, 2005
    Yes, i know that Microsoft likes doing things its way, but not long ago the open OASIS standard was announced, why can't you guys get together, each with a few compromises, and agree on one standard format for documents, for ounce think about the good of the customer, and not just have to be different, you can start some open standard consortium like W3C and finally standardize things
  • Anonymous
    June 02, 2005
    What about IRM documents and non-windows platforms? We'll leave off Linux for now. What about MacOffice 12 and IRM'd documents? That needs to interoperate too. If it doesn't, this is a half done initiative. The IRM question needs to have the correct answer, and that answer is NOT "Use Windows", especially not to someone running MacOffice.
  • Anonymous
    June 02, 2005
    The comment has been removed
  • Anonymous
    June 02, 2005
    One thing I didn't like about xml in Office 2003, was that when I have a clean, structured xml file, created from scratch, and I do a simple edit in the Word editor, like changing text, the whole xml file is changed with Word's typical xml stuff and structure is gone. I was thinking of some sort of text-change-only mode would be usefull, where all formatting is reserved and after resaving only data (text) inside <w:t> tags would be allowed to change.
  • Anonymous
    June 03, 2005
    Good news, everyone.

    So all we'll need is just the right XLST stylesheet to transform MS XML Office format to OpenDocument. I think this really moves Microsoft towards compatibility with the rest of the world :)

    But seriously - I think that if MS XML Office format and OpenDocument format will be losslessly transformable just via XLST transformation, the world is about to be happy.
    I can imagine big Unix/Linux backend servers that store thousands and thousands of documents in OpenDocument format, utilizing fulltext search, categorizing, excerpt creation over them while still being able to offer these documents to end-users in a format MS Office will be able to open with no harm.
  • Anonymous
    June 03, 2005
    Chris-- you're right about the properties now being unavailable to Windows. They will have to provide a property handler for the shell and indexing services to use in order to access the properties, and since this will involve opening the zip, etc, it will probably be much slower than currently. I'm surprised to see a move away from structured storage/compound documents, since MS has invested a lot into that technology. Avalon will be using compound documents for file storage, so it's not like the technology is bad or out of date.
  • Anonymous
    June 03, 2005
    The comment has been removed
  • Anonymous
    June 03, 2005
    Hi Brian,

    According to the white paper, the VBA Project will always be stored as a binary file. While that is great for production files, where we wouldn't want people to see the code, it's horrible for development. I would love to be able to store one of these files in ClearCase / SourceSafe etc and see the VBProject in clear text - which would then allow me to perform diffs, merges etc on the code.

    Also, will you be adding the ability for us to store VSTO assemblies in the document, so we could deploy them within the document file rather than having to host them on a web site?

    Regards

    Stephen Bullen
  • Anonymous
    June 03, 2005
    Stephen,
    Actually, the default formats don't store a VB project. You need to use the macro-enable (that's just what we are currently calling it internally) version of the format if you want to store the VB project. The VB project will be stored as a seperate binary part in that case.
    The format is fairly extensible, but we aren't building support for embedding the VSTO assemblies and being smart about handling them when they are opened. While this isn't really my area, it is something I have an opinion on. I think the direction for distribution of code and solutions is really moving away from code within the document. The documents really just have the contents, meta-data, maybe even customer-XML blobs that help identify what their content type is and what solution they are a part of. The code then lives seperately which makes it much easier to manage and makes the documents much more portable.
    -Brian
  • Anonymous
    June 03, 2005
    <p>I also have to wonder if any confusion that may be caused by everyone referring to this as the "MS <b>Office Open</b> XML Format" and its similarity to the "<b><a href="http://www.openoffice.org">OpenOffice</a></b> XML Format" is an intentional marketing ploy...</p>
  • Anonymous
    June 03, 2005
    (try this again without HTML...)

    I also have to wonder if any confusion that may be caused by everyone referring to this as the "MS Office Open XML Format" and its similarity to the "OpenOffice XML Format"[1] is an intentional marketing ploy...

    [1] http://www.openoffice.org
  • Anonymous
    June 03, 2005
    The comment has been removed
  • Anonymous
    June 03, 2005
    The comment has been removed
  • Anonymous
    June 03, 2005
    Mike Jones, what exactly are you wanting for it to be called open? Letting anyone who wants to change it? That would be a nightmare, for obvious reasons. Seems to me 'open' means 'not a binary format (or other format for that matter) that has no mechanisms for interpreting it except through the app itself' Does OpenOffice call their formats 'open'? If so, what is the difference between their open format and Microsoft's?
  • Anonymous
    June 03, 2005
    Do you plan to use something like MathML for equations?
  • Anonymous
    June 06, 2005
  1. Great news - good job Brian!
    2. What tools/features I've being waiting for in years:

    As a web developer in his 6'th year of experience. I have being fuzzeling with Content Management since I started. My dream of my Content Management systems is that the authors could use MS Word and Excel to publish their content. And with an Office version that could save As HTML we were on the right track I thought.
    But unfortunately the file size was extremely and with its half xml-alike format (WordML) and images that where stored in a subfolder (1 original and 1 scaled) I gave up making a Word plugin.
    With Office 2003 I have spent many hours of testing how I could use the native XML format it can save a document as.
    But somehow the native format requires a lot of schema developing - and I even tried but with no promising results.

    What I've been waiting for and maybe still will be from the MS Office 12 point of view is a Server side Office Document Content Framework and a client side Office Content Builder, these of cause all fictive names. The Office Document Content Framework on the server side should be able parse all parts (tags or what ever) made by the author in the Office Content Builder maybe even running directly in Internet Explorer.
    The Office Content Builder has access to all documents on the client so you easily can open documents in the Content Builder and save on a server that can handle the document with the Content Framework, then parse each element - apply the styles that are used for presentation directly from the server environment, could be a corporate website - marketing compliances like a press site or documentation for a product.
    The core problem I have with the Office suite is that what I face everyday is that not 2 documents made within the same organisation looks equal often even the same person (including myself). And I develop e-business, branding solutions and accsociative business web solutions, extranet, intranet etc. which not only have the goal to be functional but at the same time promotional and with a branding effect on the persons that use our customers solutions.
    That makes Office to my worst enemy - fairly because all of our customers use Word and Excel and has to adopt new behaviors when making content for their web solutions.
    My solution has to this day been to develop a java application (as the Office Document Framework) and a java Applet (as the Office Content Builder) - but I'm not a skilled Java programmer and the time to develop such applications are way off what I prioritized for at work.
    I have to think how our customers get the most value out of our solution for the lowest TCO possible. That feeds our customers to come back for more and get value out of their investments.

    Now with this announcement of Office 12 supporting XML as its default format - and how the document structure is laid out, I again feel some sort of - YES, let me get hands on and I'll make my KILLER APP.

    - Thanks for listening
    Kevin
  • Anonymous
    June 08, 2005
    R

  • Anonymous
    June 10, 2005
    What do you know, Microsoft goes and steals ideas from OpenOffice. Another one of Microsoft's "Embrace and Extend" strategies, eh? Tsk tsk. I'll take OOo, thanks.

  • Anonymous
    June 13, 2005
    If I read the undercurrent correctly, I should now be able to use a repository of values in various (Word) documents. Altering a value in the repository (only there...?), should also affect all other instances where that value is embedded (linked).

    My question is:
    Is there an intuitive MS Office method for performing the XML-type embedding and linking (Drag & Drop... Into what...?)

    -Ofer

  • Anonymous
    June 13, 2005
    Pascal - Thanks for your post. You are right that we are not using new proprietary technologies for these formats. We decided to use XML and ZIP since they are already so widely in use today by many different applications, OpenOffice included.

    Ofer - The formats themselves do not introduce new functionality for linking documents to data sources. Since they are open though, you easily get access to all of the applications functionality. Word already has support today for custom XML, which allows people to mark up the documents with their XML, and build additional solutions on top of that. Unfortunately, I currently can’t talk about what else is coming in Office12 around this, but I’m really excited about the chance to dig into this more when that time comes. I’m sure you’ll notice that XML is a big deal to us, and we continue to look at ways to innovate here.

  • Anonymous
    June 14, 2005
    If XML format is the default in the next version of Office, so, how can you protect, lock the word document?

  • Anonymous
    June 14, 2005
    Whatever - The protection of the documents will be handled in the same way it's handled for the current binary documents. You can either use encryption; IRM; or if you just want to validate it you can use signatures.

  • Anonymous
    June 14, 2005
    Are the MIME types for the new XML formats the same as for the current Office formats?

  • Anonymous
    June 14, 2005
    Thanks, Brian, for your first hand informations about the forthcoming Office file formats. Could you please address the points brought up earlier by other readers concerning the OASIS OpenDocument standard. It's the only type of questions you didn't answer so far.

  • Anonymous
    June 15, 2005
    Glen - The MIME types will be different for the new formats.

    Kirchrainer - I actually have a seperate post addressing OpenDocument format. http://blogs.msdn.com/brian_jones/archive/2005/06/13/428655.aspx
    There are actually a ton a great replies already. I haven't had a chance to reply with my own comments yet, but I hope to later on today.

  • Anonymous
    June 16, 2005
    You can model your application's file format after Microsoft's new Office file format.

  • Anonymous
    June 21, 2005
    Brian,

    Thanks for taking feedback. Don't take this personally ... but Microsoft XML (from Word) only makes my life miserable. I am going to start pushing OpenOffice at my company. I'm sick of Microsoft's convoluted XML. When I save as XML, I can't even open it in an XML editor (I use XML Spy).

    Please provide a "simple XML" for export that just uses style sheet names as elements, and I'll finally be able to use Microsoft's XML without creating a tool/process to clean it up.

    Given how the XML format is supposed to be used with XSL and as a clean way to share information between applications, why does Microsoft make that possible when exporting XML?

  • Anonymous
    June 21, 2005
    The comment has been removed

  • Anonymous
    June 21, 2005
    Brian,

    Thanks for the quick reponse. I found out why the XML file (exported from Word) wouldn't open in XML Spy: Word had locked the file. Once I closed the document in Word. Word still had it opened as myfile.xml after I had saved it as XML, even though I had opened it originally as myfile.doc. Whether that is the best behavior is another discussion.

    About the XML format. When I have a Word file with styles like Heading1 and Body, it would be awesome if Word would save something similar to this:

    <document>
    <heading1>My Heading</heading1>
    <body>My paragraph.</body>
    </document>

    Only the styles of the text should be saved, as XML elements, plus a few (very few) extra elements that would be necessary, like <document> because you must have a root element, etc.

    To answer you last question, I wasn't very clear. What I was trying to say is this: why does Microsoft save documents in an XML format that only works well for sharing documents among Microsoft apps? And, no, dear readers, I am not naive ... Seriously, I would love Microsoft Word (for my clients) if I could get clean, simple XML from them. Microsoft should provide their customers (which currently has included me, but I'm seriously looking at OpenOffice now) with a way to export XML that could really be useful as data for other applications -- such as transforming the XML into HTML, or importing it into a database, etc. Unfortunately, the XML exported from Word is convoluted with presentation XML specific to Microsoft Word. Same situation with the HTML exported from Word -- a <p> should be a <p> tag, not <p style="a million style rules">.

    Provide both: Microsoft XML and simple XML. It would be easy to do. Leave out ALL the XML about presentation. And simplify the XML remaining XML elements (only used for content) to be named after the Word styles.

    A style called Heading1 in Word would be exported as an element called <Heading1> in the XML.

  • Anonymous
    June 30, 2005
    The comment has been removed

  • Anonymous
    July 01, 2005
    Thanks for the comments Dylan. I'm sorry you don't view this as great news. You should know that this move is far from being reactionary though. Just look at the history. The use of XML as a file format is definitely not a new idea. As I've pointed out in other blog posts, we first started using XML in Office back in 1997 when we started to work on Office 2000.
    The simple idea of using XML as a file format shouldn't been seen as an innovation itself. There are a ton of other software applications out there that also use XML in their formats. The reason we did this work to make it the default format is that XML is so widely in use today, and there are tons of tools available for working with it. I guess we could have gone and invented some other technology, but that would have defeated the goals of using common industry standard technologies to represent our formats (ZIP and XML).
    I've had a couple recent posts showing how to leverage SpreadsheetML for Excel. Work for that specific format started back in 1999, and it's now shipped with 2 versions (XP and 2003). Try it out if you get a chance.
    -Brian

  • Anonymous
    July 07, 2005
    Some of you who have worked with Office 2003 xml files may have noticed that while we use the &quot;.xml&quot;...

  • Anonymous
    July 18, 2005
    If you read Part 1 of the Word XML Introduction, you saw the basics behind a Word document, as well as...

  • Anonymous
    July 21, 2005
    Can you tell me how a Business Systems Developer (no web, no XML but lots of VBA) can determine how storing a document in XML might benefit a small business currently sharing much of its data across Word, Excel and Access through VBA?

    Interoperability is of paramount importance to us but I really can't see how XML would help. Most examples I have seen require .Net libraries to read XML data via VBA into different applications. Will the necessary libraries be built into Office 12! I have little time for training as it is so I am trying to determine whether this a technology I should embrace (or steadfastly ignore)? The learning curve appears a little steep from where I am standing.

    I could save my documents in an XML zipped format now (if I wanted to) so please tell me how I can USE this new technology. For example I have opened a XML version of an Access query in Word 2003 but it seemed much less useful than reading the data in via VBA (I couldn't do anything with it) so please what is the benefit? What am I missing?

    For example, currently we have VBA code wich allows our secretaries to enter a scheme reference and see a list of addresses relevant to a client. They then select the appropriate address for the letter (home, bank accountant etc.) Data is read from an Access database and inserted in the correct format so only the body of the text has to be inserted. Could you tell me how that can be achieved through XML? Or perhaps you could tell us how a mail merge might work?

    Of all the tecnologies Microsoft has introduced this has baffld me the most.I don't currently have a problem with your software. Data storage is getting cheaper so zipping files seems a retrograde step and splitting files into multiple files just seems to add to the everyday overheads. More files to potentially become corrupted? More files for users to "lose"? But that is OK because they are smaller (and therefore less important?).

    Do you have any users on your committees?

  • Anonymous
    July 25, 2005
    From Chris Prattley's blog - "There's a thing on the client called the "schema-library" that associates XML namespaces of your choice with XSL files, solutions, etc. This means once you're set up in the schema-library, you can dump blobs of XML to Word (via e-mail attachments, or code), and Word will check the XML you provide - find the associated files to deal with it locally, and transform that XML using a presentation that can also retain the XML markup you supplied. Note this important difference - this is not converting one schema into another like a file converter (although it can be used that way) - it is generating presentation to wrap around the actual customer data, which is retained in the resulting file."

    This is more like it but how do I locate the schema library? I think Microsoft is going to have to start lessons soon. I want to learn how to utilise this technology but I can't afford to wait until the product is launched.

    Personally (after taking a look at Infopath) I believe our processes are too complex for XML but I can't tell from the information I have. Also, I can't ask for time and money to train for something that might be of little or no benefit to the company. There must be hundreds out there like me (sole IT person, small, IT intensive company) who need practical help in understanding what can and can't be done in XML.

  • Anonymous
    July 26, 2005
    Sue, a good starting point would be to play around with the labs that I mentioned in this post: http://blogs.msdn.com/brian_jones/archive/2005/07/08/436880.aspx

    -Brian

  • Anonymous
    August 08, 2005
    A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release...

  • Anonymous
    August 08, 2005
    A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release...

  • Anonymous
    August 08, 2005
    A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release...

  • Anonymous
    August 12, 2005
    Mark Twain said: "A successful book is not made of what is in it, but what is left out of it." Excellent reference, though :-).

  • Anonymous
    September 14, 2005
    Microsoft qui a annonc il y a peu le nouveau nom de son futur OS continue de dvoiler l'avancement de ses projets. Et le petit dernier s'appelle Office 12. Cette nouvelle version du pack office semble tre un progrs crucial pour les applications...

  • Anonymous
    September 19, 2005
    Massachusetts's Information Technology Division has released Microsoft's formal 15-page reply to the state's controversial draft policy on information standards. That policy would mandate that the Open Document Format be used for all &quot;office&quot; documents (ie, word processing, spreadsheet, and presentation documents). Because OpenDocument is incompatible with Microsoft's Office applications, the...

  • Anonymous
    October 31, 2005
    The comment has been removed

  • Anonymous
    November 01, 2005
    No comments. Well done.<br>
    ----------------------------<br>
    [<a href='http://www.online-gambling.nu'>online gambling</a>]

  • Anonymous
    November 13, 2005
    Dear Brian,

    I went through your blog and was highly impressed by the new format. I think you can really help me since you have been working on Office Format for microsoft for the last 5 years.

    Brian, I work with a Document Management Company and we are looking to incorporate and handle Microsoft's Office Format's in our Viewer.

    I am trying to find out as to where can I get the technical Specifications as to all the Office Formats of Microsoft.

    Currently our viewer lacks support for Office Formats. If you could guide me on this, I would really appreciate your gesture.

    Youcan get me on tarunklal@gmail.com

    Kindly let me know the same. We need all possible specifications in order that we handle and display the Office formats successfully.

    Thanks

    Tarun

  • Anonymous
    February 03, 2006
    ...it was time to have XML support.
    let's hope it works!

  • Anonymous
    February 17, 2006
    PingBack from http://jameslin.wordpress.com/2005/06/03/big-week-in-information-technology/

  • Anonymous
    February 27, 2006
    PingBack from http://www.c10n.info/archives/371

  • Anonymous
    March 01, 2006
    Columbus hotels

  • Anonymous
    April 04, 2006

    Direkter Download: SPPD-2005-06-08&amp;nbsp;9,1 MB
    Intro
    Editorial

    SharePointTag und Webcasts Downloads...

  • Anonymous
    April 17, 2006
    Just an FYI regarding your attribution of the quote regarding your not having the time to make it shorter (last paragraph).  While it does sound like something Mark Twain would write, it comes from the great composer Franz Liszt in a letter he wrote (to J. W. von Wasielewski) on January 9th, 1857.

    http://www.globusz.com/ebooks/Liszt/00000183.htm

  • Anonymous
    May 02, 2006
    This is definately big news. Personally I am an Office 2004 user, but I will be very glad when I can have document portability between word processors. but if u need any help from us just log in to :http://www.ideas4mysmallbusiness.com

  • Anonymous
    May 08, 2006
    i found alzip for many days, at last i found ALZip - a zip and file compression utility.

    http://www.yaodownload.com/utilites/file-compression/alzip/

  • Anonymous
    May 15, 2006
    워드 2007은 기본적으로 .docx 확장자로 저장을 합니다. .docx 아이콘이 기존 아이콘과 비슷한 모양을 사용하며, .doc 파일은 워드 97~2003 호환 포맷으로 2003이라고...

  • Anonymous
    June 03, 2006
    i like your website very much but please do get us more information about it

  • Anonymous
    October 03, 2006
    Microsoft has announced that the next version of Office (unofficially "Office 12") will deliver support

  • Anonymous
    January 14, 2007
    PingBack from http://www.livejournal.com/users/induke/35844.html

  • Anonymous
    January 14, 2007
    PingBack from http://www.ljseek.com/great-news-for-ms-office-geeks_57553764.html

  • Anonymous
    February 22, 2007
    The comment has been removed

  • Anonymous
    February 22, 2007
    The comment has been removed

  • Anonymous
    February 24, 2007
    The comment has been removed

  • Anonymous
    March 12, 2007
    If you read Part 1 of the Word XML Introduction, you saw the basics behind a Word document, as well as

  • Anonymous
    February 04, 2008
    Microsoft has been and continues to be fully committed to opening its document formats for Word, Excel

  • Anonymous
    February 09, 2008
    PingBack from http://michaelcarnell.palmettobug.com/2008/02/09/microsoft-office-for-the-mac-take-two/

  • Anonymous
    April 04, 2008
    PingBack from http://copyrightrenewalsblog.info/brian-jones-open-xml-formats-new-default-xml-formats-in-the-next/

  • Anonymous
    June 26, 2008
    PingBack from http://mohamed.videomarketsite.com/microsoftoffice2003comparedtooffice2007.html

  • Anonymous
    July 11, 2008
    PingBack from http://mohammed.greatvidsdigest.info/unfortunatelyidonthavethedesiredexperiencehoweverithoughtiwouldapplyfortheopening.html

  • Anonymous
    August 08, 2008
    PingBack from http://www.developerzen.com/2005/06/02/microsoft-office-12-to-use-xml-formats-by-default/

  • Anonymous
    August 17, 2008
    PingBack from http://www.automateexcel.com/2005/06/02/xlsx_at_channel_9/

  • Anonymous
    September 06, 2008
    PingBack from http://www.hoakz.com/blog/?p=3

  • Anonymous
    November 26, 2008
    The comment has been removed

  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=brian-jones-office-extensibility-new-default-xml-formats-in-the-next

  • Anonymous
    June 01, 2009
    PingBack from http://portablegreenhousesite.info/story.php?id=15154

  • Anonymous
    June 08, 2009
    PingBack from http://toenailfungusite.info/story.php?id=3358

  • Anonymous
    June 08, 2009
    Direkter Download: SPPD-2005-06-08 9,1 MB Intro Editorial SharePointTag und Webcasts Downloads auf www

  • Anonymous
    June 09, 2009
    PingBack from http://insomniacuresite.info/story.php?id=1811

  • Anonymous
    June 09, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=4196

  • Anonymous
    June 15, 2009
    PingBack from http://debtsolutionsnow.info/story.php?id=3882

  • Anonymous
    June 17, 2009
    PingBack from http://patioumbrellasource.info/story.php?id=2640