Why do Office ".xml" files behave differently from other ".xml" files?

Some of you who have worked with Office 2003 xml files may have noticed that while we use the ".xml" extension, the files still show unique icons and the original application is launched when you double click them. The files are totally valid XML files following the W3C 1.0 spec. The reason they behave differently is that we put a PI (Processing Instruction) at the top of the XML file that identifies which application created the XML. Open any of the Word XML files with a text editor, and you'll see the following:

<?mso-application progid="Word.Document"?>

That declaration is what let's us know that it's a Word XML file. We do the same thing with InfoPath and Excel XML files. There is a component that we call the msoxev that sniffs files with the .xml extension and looks for that PI. When it sees the PI, it then does a lookup in the registry to see if there is an application associated with the prodig attribute. If so, it will use that application for opening and editing the file.

We also run this in IE, so if you open one of the XML files in IE, it will automatically get handed off to the proper application. This is great if you are just following a hyperlink and want to view the file with the application that generated it. If you are debugging the files or want to view the XML directly in IE though, it's a bit of a pain. If you want to open the file in IE, and not get redirected, you have a couple options.

One time adjustment: If you want to change the behavior just for that specific document, you can open the file in a text editor and delete the PI. Then it will behave just like a regular XML file.

Permanent adjustment: This is a behavior you can easily modify if you want. The XEV mechanism just sniffs the registry to see what the content type for that file is. Go to the following: "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\11.0\Common\Filter\text/xml" and you'll see a collection of entries. The name of the string matches the progid attribute in the PI, and the value of that string is the content type for the file. If you don't want the current behavior, you can just delete the string or rename it, and it will now behave like any other XML file.

You can also customize this regkey to register your own applications that want to use the .xml extension.

We won't have this issue with the Office 12 XML files, because we actually use unique icons. It was something we had discussed doing with the Office 2003 XML files but eventually decided against it. The new default formats will still be XML, but they will actually be wrapped in a ZIP container and we decided using unique extensions (.docx, .pptx, .xslx) was the best way to go.

-Brian

Comments

  • Anonymous
    July 07, 2005
    If you have the Word 2003: Xml Viewer (of http://www.microsoft.com/downloads/details.aspx?FamilyID=19676b18-1bcd-4852-93ba-0b5a203ea731&displaylang=en), the content type application/msoxmlviewer is also available to be used to open your own .xml files in the viewer directly in IE. This would work well if you have one or more transforms to view your own xml files.
  • Anonymous
    July 08, 2005
    I'm glad that you've chosen unique extensions for the new formats, but I suppose this might be the place to reiterate my plug for the OpenDocument mimetype convention for zip formats, which could easily be adopted in addition to the new extensions.

    The convention is to place a "mimetype" file first in the zip archive, uncompressed, whose contents are a MIME type describing the entire document.

    See section 17.4 of the OpenDocument v1.0 standard, http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf.
  • Anonymous
    July 08, 2005
    Y, that's an interesting suggestion. Does that mean that the file is not valid though if that "mimetype" file isn't the first file in the archive?
    Many ZIP tools out there don't give you much control over the order of your parts without added complexity. We really want people to be able to use existing tools to generate these files.

    -Brian
  • Anonymous
    July 08, 2005
    The OD standard says "should". My interpretation would be that the positioning of the mimetype is intended for the convenience of unsophisticated tools that do not understand the file format and wish merely to identify the type of the file from its first few bytes--something that is quite convenient, for example, if the extension is somehow lost.

    Tools that understand the zip file format presumably would be able to interpret the contents of the mimetype no matter how it might be placed within the zip file. Tools that understand the precise format of the file (odt, ods, docx, xlsx, etc.) presumably would be able to deal with it properly even if the mimetype were missing altogether.

    For your formats, you could certainly choose to make the mimetype an optional recommendation, and arrange for your code to generate it.
  • Anonymous
    July 08, 2005
    That's something I'll think about a bit, but I'm not really a big fan of optional things like that when they have such a significant meaning. No tool could really rely on it if it's only optional. If it's not optional, then it makes it that much more difficult to create the files.
    The content type of the start part is fairly easy to determine and is done in a ZIP agnostic way. I think that most likely that's what we'll stick with.
    Thanks for the suggestion though. Like I said I'll think about it a bit more.
    -Brian
  • Anonymous
    July 08, 2005
    What specifies which part is the start part?
  • Anonymous
    July 08, 2005
    Check out the example I linked to in this post: http://blogs.msdn.com/brian_jones/archive/2005/06/20/430892.aspx
    There is a package relationship file that is always here: "/_rels/.rels"
    It's an XML file that describes all the root level relationships. The relationship of type officeDocument points to the start part.
    There is a content type file that is always here: "[Content_Types].xml"
    That describes the content types for each part in the file. You look up the content type for the start part and you'll know what kind of file it is.
    -Brian
  • Anonymous
    July 08, 2005
    The comment has been removed
  • Anonymous
    August 05, 2005
    Slashdot: MS Office XML Format Now in TextEdit
    I saw this the other day on slashdot. I have to admit...
  • Anonymous
    August 31, 2005
    The comment has been removed
  • Anonymous
    January 30, 2006
    Anyone have an answer for Wes' question? I too have the right progid in my XML file, but it still opens up in IE and then INSIDE IE opens Word. I want it to open up directly in Word, which I swear it did yesterday.....
  • Anonymous
    February 13, 2006
    Phil, it could be that you've installed the Word XML viewer (http://www.microsoft.com/downloads/details.aspx?familyid=19676b18-1bcd-4852-93ba-0b5a203ea731&displaylang=en)

    That would cause the files to open in IE instead of Word, but you should be able to choose to edit them in Word (from the shell or even within IE). When it opens in IE is it in the XML view or is it rendered with the rich formatting?

    -Brian
  • Anonymous
    March 19, 2006
    What would the mime type be for MS Wordviewer 2003.
    When we use the Word XML Viewer the html output makes the doc unformatted. and all styles are lost. The Wordviewer maintains the structure. Our goal is when selecting a link to a xml document the Wordviewer 2003 will open the file.
    Any suggestions.

    Your articles are great and very informative. I hope there is a solution to this problem.

    bzuck@adelphia.net

    Thanks
    Bill Zuck
  • Anonymous
    May 19, 2006
    We have a situation where we're using the XML Viewer - Content Type = "application/msoxmlviewer". On a Windows 2003 server,  I see raw xml when I try to download a file from the browser.  However, when I save it to a folder and double click, the formatting I had applied in Word comes back. I am not sure why we're seeing the raw xml when downloading. Any help would be appreciated. The goal is to get the formatted word document directly on download.

    Thanks,
    Meena
  • Anonymous
    June 06, 2008
    Some of you who have worked with Office 2003 xml files may have noticed that while we use the &quot;.xml&quot; extension, the files still show unique icons and the original application is launched when you double click them. The files are totally valid