Integrating with business data: Store custom XML in the Office XML formats

As I've already talked about a number of times on this blog, there are two core pieces to the XML support in Office. The first piece is what we call our "reference schemas." The reference schemas are the schemas we've defined in Office to represent our files as XML. This includes WordprocessingML; SpreadsheetML; and PresentationML. The reference schemas are extremely helpful in consuming and generating Office documents, but they are only one piece to the XML story. The other piece is the support for custom defined schemas. It's that support that allows you truly integrate your documents with business processes and business data. Most organizations don’t really want another group to define their business data for them. That's why we took the XML 1.0 standard in combination with the XSD standard and built in native support for both. You can define your data using XML Schema syntax, and then you can use that data in your Office documents. By opening up our formats with our reference schemas, and supporting your custom defined schemas, you get true interoperability of your documents. Sorry if this is currently sounding more like a marketing pitch, but I wanted to make sure I reiterated our vision for XML support in Office documents and hopefully that will help you see the power that we see.

In Office 2003, Word and Excel both introduced support for marking up content in the files with custom defined schema, but one of the big things we saw from folks building solutions on top of our XML support in Office 2003 was the need to store your own XML data in the document. We had support for marking up a document with your own schema, but if you had data you didn't want to show to the user, there weren't a lot of options. An example use of this would be workflow scenarios where you have a ton of information you're tracking about a document to determine how to route it. Some of that data might appear in the document itself, but a lot of it is just extra meta-data.

XML Data Store

In Office 12, we've introduced a new feature to the formats that we're currently calling the XML data store, and the way it works is really simple. As you should all know by now, the new format consists of a ZIP file with a bunch of XML parts (files) inside. Up until now we've talked about all the parts that we in Office have defined to create our documents. You as a developer also have the ability to add your own parts though. You can take any XML file and put it inside the ZIP package. Then all you need to do is create a relationship from the main document part to your XML part, and the Office applications will roundtrip your XML with the file, which means:

Roundtripping your data: The ability to put your XML in the ZIP package means that you now have a place to store any data your solution may need. The data will travel with the document, but will always be stored as a separate XML part in the ZIP package. This means it's really easy to get to and modify without dealing with any of the application's data.

Accessing your data while the file is loaded: Whenever we load these files, we grab all the XML parts in the datastore and load them into memory. We then give you programmatic access to this data so you can read and write to the data while the user is editing the file. There is a full eventing model around this data as well so that if other processes make changes to the data you are notified. This really gives you a lot of power when you are building a solution because you can have a place where you can store all your information as XML, and you have full access to it both while the file is loaded, as well as when the file is saved to disk (just crack open the ZIP and go grab your XML part).

Separating data from the document: As well, because the information is stored in the data store, you benefit from the fact that the user cannot directly edit your data by editing the document (they can’t accidentally delete part of your data, since it’s stored separately.

There are a number of really cool features we've built on top of this functionality that I'll talk about in future posts.

-Brian

Comments

  • Anonymous
    November 04, 2005
    So what happens if you tell Word to use the ODF schema?
  • Anonymous
    November 06, 2005
    Stuart asks an interesting question. I guess the first problem is that the ODF schema is expressed in Relax-NG, not XSD, so someone would have to come up with an XSD to use.

    But I think there's a bigger problem. You could certainly stuff some ODF XML parts into the Office "12" file, but if I understand Brian's description, this is not enough to have Word process them as part of the Word presentation. Notice that these additions are separated from the WordprocessingML and are not editable by the user in the way the WordprocessingML is.

    And, finally, the schema doesn't say what the stuff means, only how it is arranged (as acceptable XML), so it would be mystery meat to Word. The rest is a small matter of programming [;<).
  • Anonymous
    November 07, 2005
    Sounds great! It sounds like it will eliminate the need for using custom document properties and merge fields. The farther away we get from dealing structured storage (and IDispatch interface interaction) the happier I'll be.

    I'll be looking forward to a release version of Office 12 and released development libraries for zipped files.
  • Anonymous
    November 08, 2005
    Echoing WPoust, sounds great -- and it looks like something that isn't currently in OpenOffice (although OOo does have a translation facility between ODF and some other XML formats).

    My questions: 1) When can we see some examples? 2) Does this work on the MacOSX versions as well without limitations?
  • Anonymous
    November 09, 2005
    I want to write my own app to make Office12 spreadsheets. To be sure I'm on the right track I need to validate using the reference XSDs. Where are they or when are they?
  • Anonymous
    November 09, 2005
    FARfetched, I'll post some examples soon. We have the first Beta coming out in just a little bit, so I'll create some files with that and post them on this site.

    Ray, I talked about the first draft of the Office 12 reference schemas in this post: http://blogs.msdn.com/brian_jones/archive/2005/09/14/466408.aspx

    There should be an update to those schemas soon after we release Beta 1.

    -Brian
  • Anonymous
    March 27, 2006
    Links to blog posts that contain useful technical information for developers.  Open XML is a new standard, but there's some good information already available if you know where to look.
  • Anonymous
    July 20, 2006
    [We then give you programmatic access to this data so you can read and write to the data while the user is editing the file. There is a full eventing model around this data as well so that if other processes make changes to the data you are notified.]--

    This sounds like exaclty what I need, is there any place that I can get my hands on reference matierial for this functionality.