Freigeben über


What about Word 2003's XML format?

I've had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported in Word 2007. I mentioned this back in the fall, but in case you missed it let me repeat that the Word 2003 format will continue to be supported in Word 2007.

There are a ton of folks out there who have already built solutions on top of the Word 2003 XML format, and those will continue to work. Everyone can decide for themselves whether they want to port those solutions forward into the new Open XML format, or keep them in the 2003 XML format. The new Open XML format is largely based on the Word 2003 XML format, so you'll see a lot of similarities.

One of the benefits of the Open XML format over the Word 2003 format is the number of Office versions that will support it. As you all know by now, the new Open XML formats will work in Office 2000, XP, 2003, and 2007; while the Word 2003 XML format will work in Word 2003, and 2007.

The XML formats in Word 2007 will be:

  1. Word document (.docx) - This is the default, and it's in the Open XML format
  2. Word macro-enabled document (.docm) - This is the macro-enabled version of the Open XML format
  3. Word XML document (.xml) - This is a single XML file that is a serialized version of the Open XML format
  4. Word 2003 XML document (.xml) - This is the exact same as the XML format that Word 2003 supported

We also will continue to support opening anyone else's XML files as well, just like we did in Word 2003. Here's an entry I made back in the summer about opening your own XML files in Word: https://blogs.msdn.com/brian_jones/archive/2005/08/16/452478.aspx

-Brian

Comments

  • Anonymous
    April 04, 2006
    This is great news but what about VSTO hooking into Word 12? Right now, in 2003, when we run Application.Selection.XML() we get the Word 2003 format of XML. What format will be returned in Word 12? Will it be one "flattened" XML string of the entire Word 12 document?
  • Anonymous
    April 04, 2006
    Great question. The range.xml property will still return the 2003 XML format. There is also a new property added called "wordOpenXML" (or something similar to that) that will return the serialized version of the Open XML format.

    -Brian
  • Anonymous
    April 05, 2006
    Great question!

    Michael Locker MD
  • Anonymous
    April 06, 2006
    Hey Brian,

     I have a wordprocessingml schema implementation question.  Even in Word XML 2003, section properties (<sectpr>) appear in the last paragraph of a section.  Why was it decided to put the section properties at the end of a section instead of the beginning?

     I see how a producer of a Word doc will derive some small benefit because they will no doubt have that information by the time they reach the section end.  What about consumers that view the document though?  To display a correctly wrapped page it needs the margins, page size, headers, footers, etc.  This means the consumer must either 1.) read through the whole section before displaying anything within the section, or 2.) once the consumer encounters the section properties in the last paragraph of a section, it most likely must clear, rewrap, & redraw anything in the section that it has already displayed.  This seems like a (possibly very large) performance hit for the consumer - either delay the display of content (as in 1.), or possibly display it incorrectly initially and end up doing redundant rewrapping to correct it later(as in 2.).  Is there any possibility that the section properties will be output at the top of a section (or even in a separate xml part) in the future to avoid these drawbacks?  Or is there a different reason behind putting them out at the end?

    Thanks for your insight,
    Jay
  • Anonymous
    April 14, 2006
    The comment has been removed
  • Anonymous
    May 16, 2006
    An automated batch tool to convert Word 2003 XML files (and DOC) to the 2007 format would be nice. Think of Photoshop's droplets, only with subdirectory recursion.

    Otherwise, users will have to convert their important files one-by-one, which means that many files will not be future-proofed and end up, in 8 years when import converters are no longer provided for DOC files, unreadable.

    (This will happen: from the Microsoft KB "Word 2002 and Word 2000 do not have an import converter for Word 2.0 for Windows or earlier.")
  • Anonymous
    May 22, 2006
    Hi,

    I can't any information about
    The "single XML file that is a serialized version of the Open XML format"
    in the OpenXML Draft 1.3 (may 2006).

    Where can I find it?

    thanks.
  • Anonymous
    May 22, 2006
    Hi Yves,

    The format that we are currently working on standardizing is just the main one which uses ZIP (the default for Office). The serialized version is not currently planned as part of the standard. It just takes the standard and essentially replaces the ZIP container with an XML container.

    -Brian
  • Anonymous
    August 08, 2007
    PingBack from http://www.kintespace.com/rasxlog/?p=711