Jaa


Simple wordprocessingML document (video demo)

Are the Office Open XML formats too complex? Not really. They are definitely very rich, but the structure of the formats is pretty simple. As you start to get into more complicated features, the complexity of the formats also kicks in though. To show how simple a basic document is though, I like to use this example wordprocessingML document whenever I give a presentation on the formats. I actually posted this example using Beta 1, but the formats have changed a bit so I though it would be worth posting an updated example. (you can view a video of a simpler version of this demo up on Channel 9)

Let's create a really simple document that has three things: a paragraph of text; a hyperlink; and an image.

Part 1 - single paragraph of text

The Office Open XML format is comprised of a number of XML files within a ZIP package. The files follow a simple set of conventions called the open packaging conventions (described in Part 2 of the standard). You need to declare the content types of the parts, as well as tell the consuming application where it should start (via the package relationship), so even you most simple document will have at least 3 files within the ZIP package. Before creating the ZIP file, let's just create a folder somewhere and in that folder create the following files:

  • document.xml
  • [Content_Types].xml
  • _rels/.rels

document.xml

The first thing we need is the actual XML that describes the content of the document. We're going to create three separate paragraphs. The simplest version of this would just have the one paragraph with some text:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:wordDocument xmlns:r="https://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Hello World!</w:t>
</w:r>
</w:p>
</w:body>
</w:wordDocument>

_rels/.rels

How does a consuming application know where it should start when opening an OpenXML file? The first place you always look is the package relationships file. The package relationship file will always be located in the "_rels" directory, and it's always called ".rels". We need to create an XML file that tells the consumer that "document.xml" the first place you should go, and that this type of document is an Office Open XML document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="https://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="document.xml"/>
</Relationships>

Notice that a relationship has three main attributes. It has an Id attribute who's use will be more obvious in a bit. The Target attribute tells you where to go, and the path is relative to the parent directory of the "_rels" folder that relationship file is in (in this case that's the root directory). The Type attribute describes what kind of relationship it is (ie what kind of stuff is it pointing at).

[Content_Types].xml

Every Office Open XML file must declare the content types used in the ZIP package. That is done with the [Content_Types].xml file. We currently have two parts in this document that we need to declare content types for. The first is the document.xml part; the second is the _rels/.rels part. So, the content types file should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="https://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>

Notice that we've said every file with the extension ".xml" is of type wordprocessingml document. In a more complex file, that's not going to be the case, and we can instead use overrides for the content types of a specific part, rather than an extension type.

Create version 1 of our simple document

OK, we should now have three files. Two of the files are in the root directory (document.xml & [Content_Types].xml); and in the "_rels" directory we have the ".rels" file. Select the two files and the "_rels" directory and ZIP them up. Make sure that when you zip them up, the two files and the _rels directory are all at the root level.

Open this file in Word, and you now have a simple file.

Part 2 - adding a picture

Now let's add the picture to our document.

document.xml (version 2)

Open the document.xml file and add one more paragraph that specifies the image as follows (make sure you also include the two additional namespace declarations:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:wordDocument xmlns:r="https://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Hello World!</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200">
<v:imagedata r:id="rId4"/>
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:body>
</w:wordDocument>

OK, so we added some XML that describes a picture anchor, but what about the actual picture? That's handled with a relationship. Notice that the v:imagedata tag has the r:id="rId4" attribute. This says that you need to go to the relationships file of the document.xml part and find the relationship with an id of rId4.

_rels/document.xml.rels

We now need to create a new relationships file in the _rels directory. Every part is allowed to have a relationships file. The name for the relationship file is just the name of the original part with a ".rels" added to the end. The file is always placed in the _rels directory which is in the same directory as the part itself. In this case it's "_rels/document.xml.rels"; but if the document.xml file was in a "word" directory (word/document.xml), then the rels file would be "word/_rels/document.xml.rels".

We need to create one relationship with an Id of "rId4" that points to an image, so our _rels/document.xml.rels file should look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="https://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId4" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="image1.jpg"/>
</Relationships>

Notice that this time, the relationship type is an image type. It points to a part called image1.jpg so we'll need to create that next.

image1.jpg

You can use any image you want here. I just took this image file and saved it into the root directory with the name "image1.jpg"

[Content_Types].xml (v2)

We've added two new parts to the file. The first part "_rels/document.xml.rels" uses the ".rels" extension which already has the content type declared. The second part though "image1.jpg" uses an extension that we haven't yet declared, so we need to update the content types file as so:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="https://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="jpg" ContentType="image/jpeg"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>

Create Version 2 of our simple document

OK, now you should have 5 files. In the root directory you have "image1.jpg", "document.xml", "[Content_Types].xml"; and in the "_rels" directory you have ".rels" and "document.xml.rels". Take all those files and ZIP them up. You should now have a document with a simple paragraph followed by a picture.

Part 3 - adding a hyperlink

Now let's add the last piece, which is a hyperlink.

document.xml (version 3)

Here, we need to add one final paragraph. This paragraph will contain a run of text that has the hyperlink tag around it:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:wordDocument xmlns:r="https://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="https://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Hello World!</w:t>
</w:r>
</w:p>
<w:p>
<w:hyperlink r:id="rId2">
<w:r>
<w:rPr>
<w:color w:val="0000FF" w:themeColor="hyperlink" />
<w:u w:val="single" />
</w:rPr>
<w:t>Click here for Brian Jones' blog.</w:t>
</w:r>
</w:hyperlink>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200">
<v:imagedata r:id="rId4"/>
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:body>
</w:wordDocument>

Notice again that rather than reference the resource directly, we use a relationship. With the image, we used the relationship with an id of "rId4", and with the hyperlink we're going to use the relationship of "rId2". This is a general rule followed throughout most of the Office Open XML standard. Any reference to a resource (whether it be directly within the ZIP package, or it be an outside file) is done with relationships. This way you can do a quick scan of any document and quickly understand all the components that make up that file. It also makes link fix-up or even cleansing significantly easier. You never need to read through the actual content XML pieces and can instead make you modifications directly to the lightweight relationship parts.

_rels/document.xml.rels (v3)

As described above, we need to create a relationship to the URL of the hyperlink we created. In this case, we'll not only use the three core attributes for the relationship (Id, Type, Target), but we'll also use a fourth attribute specifying that this relationship points to an external resource (ie something that lives outside of the ZIP package).

So, you should make the following modifications to our _rels/document.xml.rels part:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="https://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId2" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="https://blogs.msdn.com/brian_jones" TargetMode="External" />
<Relationship Id="rId4" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="image1.jpg"/>
</Relationships>

Create Version 3 of our simple document

OK, so the V3 pass on this didn't add any additional files to the package, so just take the same collection of files from Part 2, and ZIP them up. Open the file in Word and you should see a simple paragraph; a hyperlink; and a picture. Viola!

-Brian

Comments

  • Anonymous
    October 16, 2006
    Microsoft creates this format mainly for hand-writing documents? A file format should be simple and strict, because  most of our documents was created by a  WYSIWYG editor. But now the new ECMA format is a format just like HTML. For example, the schema type ST_OnOff, which is just like a boolean type, has the six factors: "0", "1", "on", "off", "true", "false". What dose this designed for?

  • Anonymous
    October 19, 2006
    Great article. It would be very nice if you could do a similar article on SpreadsheetML? Exporting a set of records to Excel seems like one of the most frequently done integration tasks.

  • Anonymous
    October 19, 2006
    "Any reference to a resource (whether it be directly within the ZIP package, or it be an outside file) is done with relationships." That is not 100% true. Word field codes directly refer to absolute (and absolute only) paths.

  • Anonymous
    October 19, 2006
    The comment has been removed

  • Anonymous
    October 27, 2006
    I'd been meaning to post a write-up on how to create a simple SpreadsheetML document from scratch, but

  • Anonymous
    November 01, 2006
    One of the key benefits of the Open XML file formats is that they support all of the things you can do

  • Anonymous
    November 02, 2006
    I posted a bunch of "Intro to SpreadsheetML" posts about a year or so ago, but those were all based on

  • Anonymous
    December 10, 2006
    Images are one of the basic elements of a document, and the use of images in documents continues to grow.

  • Anonymous
    May 19, 2008
    Are the Office Open XML formats too complex? Not really. They are definitely very rich, but the structure of the formats is pretty simple. As you start to get into more complicated features, the complexity of the formats also kicks in though. To show

  • Anonymous
    May 23, 2008
    Are the Office Open XML formats too complex? Not really. They are definitely very rich, but the structure of the formats is pretty simple. As you start to get into more complicated features, the complexity of the formats also kicks in though. To show

  • Anonymous
    June 05, 2008
    Are the Office Open XML formats too complex? Not really. They are definitely very rich, but the structure of the formats is pretty simple. As you start to get into more complicated features, the complexity of the formats also kicks in though. To show