Jaa


Open XML SDK 2.0 Architecture

In my first post on the Open XML SDK, I talked about the overall design of the SDK with respect to goals and scenarios. Today, I am going to talk about the architecture of the SDK in terms of its different components.

The SDK Architecture

The Open XML SDK is designed and implemented in a layered approach, starting from a base layer moving towards higher level functionality, such as validation. The following diagram illustrates an overview of the Open XML SDK components.

The System Support layer contains the fundamental components that the SDK is built upon. The Open XML File Format Base Level layer is the core foundation of the SDK. This layer provides you functionality to create Open XML packages, add or remove parts, and read/add/remove/manipulate xml elements and attributes. The Open XML File Format Higher Level layer is the last layer in our architecture. This layer provides functionality to make it easier for you to code against Open XML formats. For example, one idea is to have this layer contain schema and semantic level validation to help assist you in generating proper and valid Open XML files. These layers and components are described in more detail below.

Note: Version 1.0 of the SDK only provides the Open XML Packaging API component, whereas version 2.0 of the SDK provides all components built on top of the System Support layer.

System Support Layer

The System Support layer consists of the following components:

  • .Net Framework 3.5 – The Open XML SDK leverages the advanced technology provided by .Net Framework 3.5, especially LINQ to XML, which makes manipulating XML much easier and more intuitive
  • System.IO.Packaging – The Open XML SDK needs to be able to add/remove parts contained within Open XML Format packages. Included as part of .Net Framework 3.0 were a set of generic packaging APIs capable of adding removing parts of Open Package Convention (OPC) conforming packages. Given that Open XML Formats are based on OPC, the SDK uses System.IO.Packaging APIs to open, edit, create, and save Open XML packages
  • Open XML Schemas – The Open XML SDK is based on Open XML Formats, which are represented and described as schemas. These schemas make up the foundation of the Open XML SDK. Currently the Open XML SDK is based on Ecma 376. We will add support for IS 29500 as soon as the standard is made public

Open XML File Format Base Level layer

The Open XML File Format Base Level layer provides a platform for Open XML developers to create Open XML solutions and consists of the following components:

  • Open XML Packaging API – This component is built on top of the .Net Framework 3.0 System.IO.Packaging component. Instead of providing generic access to the parts contained in the Open XML Package, this component allows developers to manipulate Open XML parts with strongly typed classes and objects. This component has already shipped as the Open XML SDK v1.0. Below is example code that illustrates using this component to open and manipulate a WordprocessingML document.

//Open and manipulate temp.docx

using (WordprocessingDocument myDoc =

WordprocessingDocument.Open("temp.docx", true))

{

//Access main part of document

MainDocumentPart mainPart = myDoc.MainDocumentPart;

//Add new comments part to document

mainPart.AddNewPart<WordprocessingCommentsPart>();

//Delete Styles part within document

mainPart.DeletePart(mainPart.StyleDefinitionsPart);

//Iterate through all custom xml parts within document

foreach (CustomXmlPart customXmlPart in

mainPart.CustomXmlParts)

{

//DO SOMETHING

}

}

  • Open XML Low Level DOM – This component represents the xml wrapper of the Open XML schemas. You are able to use this component to manipulate the Open XML tree directly by working with strongly typed objects and classes instead of traditional XML nodes that require you to be aware of namespaces as well as element/attribute names. The major advantage of having strongly typed classes and objects is that you can easily see what properties are defined on a given class through intellisense. For example, you will know exactly what properties and children can exist off of a Paragraph object. This component leverages many of the designs of LINQ. Below is example code that illustrates using this component to create a WordprocessingML document with the text "Hello World!"

// Create a Wordprocessing document.

using (WordprocessingDocument myDoc =

WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))

{

// Add a new main document part

MainDocumentPart mainPart = myDoc.AddMainDocumentPart();

//Create DOM tree for simple document

mainPart.Document = new Document();

Body body = new Body();

Paragraph p = new Paragraph();

Run r = new Run();

Text t = new Text("Hello World!");

//Append elements appropriately

r.Append(t);

p.Append(r);

body.Append(p);

mainPart.Document.Append(body);

// Save changes to the main document part

mainPart.Document.Save();

}

  • Stream Reading/Writing – This component includes stream reader and writer interfaces specifically targeting Open XML elements and attributes. The readers and writers behave similar to XmlReader/XmlWriter, but are easier to use since the interfaces are Open XML aware

All the components mentioned above are available in our first CTP of version 2.0 of the SDK, which you can download here.

Open XML File Format Higher Level Layer

Note: This layer has not yet been implemented in version 2.0 of the SDK

We are still in the thinking process for the Open XML File Format Higher Level layer, but one thought is to have this layer provide functionality to help you debug and validate Open XML files. With this in mind, we might be able to provide the following components:

  • Schema Level Validation – Manipulating Open XML Formats by using the Open XML Base layer makes it much easier for you to work on the Open XML files, but doing so does not guarantee the production of valid Open XML files. This component would assist you in debugging and validating Open XML documents based on the Open XML schemas

  • Additional Semantic Validation – This component would be similar to the Schema Level Validation component except that it would provide additional information based on semantic and syntax constrains as defined by the Open XML standard. For example, for a comment to work as expected within a WordprocessingML document, the comment needs to be defined in the comments part as well as be marked appropriately in the main document story, otherwise the comment is ignored.

    Since this type of information cannot be represented in XSD files, you are also required to leverage the prose within the standard. With this potential layer you would be able to leverage the SDK to cover much of this manual work.

Helper Function Layer

Note: This layer has not yet been implemented in version 2.0 of the SDK

As with the Open XML File Format Higher Level layer we are still in the thinking process for the Helper Function Layer. We envision this layer as a way to provide helper functions or code snippets to make your life a bit easier in creating valid Open XML files. Certain operations within Open XML can be somewhat complex. For example, deleting a paragraph in a WordprocessingML document is not simply just deleting the paragraph node. There are a variety of extra steps required to delete a paragraph and maintain the integrity of a valid Open XML document.

One thought is that the SDK could provide higher level helper functions or code snippets that can deal with common complex file format operations. These helper functions or snippets would make the appropriate xml and part/relationship modifications when performing complex tasks. These helper functions or snippets would not abstract away from the actual xml itself, but rather perform operations on the xml elements by taking advantage of the validation awareness. For example, a potential helper function for deleting a WordprocessingML paragraph would perform this delete operation and do the necessary extra steps to clean the resulting xml to ensure validity. These delete helper functions or snippets could be applied to other elements that are hard to delete, like tables and comments. In other words, these higher level functions or snippets would perform directly on the xml elements and would be constrained, in terms of functionality, by the file format standard itself.

Next Time

Now that we have gone over the basics of the SDK we are ready to talk about solutions and end-to-end scenarios. In my next few posts I am going to walk through solutions to some key scenarios, like document assembly and manipulation.

Let me know if you have any specific questions or comments that you would like me to address here or in future posts. Feel free to send post requests my way related to specific scenarios that you might be interested in learning more about.

Zeyad Rajabi

Comments

  • Anonymous
    October 14, 2008
    Gray Knowlton has an interesting post on Open XML adoption numbers, and Zeyad Rajabi continues with his

  • Anonymous
    October 15, 2008
    hi, could you tell us if the newly created Office 14 documents will be made to conform to the strict conformance IS 29500 specs ? thanks

  • Anonymous
    October 16, 2008
    It's great to see the info and Word document examples about the SDK trickle out.  However it would be really nice to see more Excel focused coverage. I'm not sure why the WordML examples are so common when it seems (at current and past employers) there was a much greater need for programmatic Excel solutions.

  • Anonymous
    October 16, 2008
    With regards to document assembly, I'd like to see an example of how to use the SDK to copy a portion of one Word document into another (without using afchunk or other Word-dependant trickery) while preserving all the formatting.  I have never seen a real honest-to-goodness example of the best way to accomplish this common task using the SDK.

  • Anonymous
    October 16, 2008
    Franco, it's a bit too early to talk about Office 14 features. Jason, I've talked to Zeyad and he's going to pull together a set of Excel examples. Seth, what's the reason you don't want to use afchunks? Does the target app not support them? Without the afchunk, you'll need to do the style clean-up yourself, which is obviously something you can do, but is much easier to ask the target app to take care of. -Brian

  • Anonymous
    October 17, 2008
    No, the target app does not support afchunks.  To my knowledge, nothing currently supports afchunks except Word 2007. This is for server-side document assembly, and Microsoft does not reccomend automating Word 2007 on the server, so the target application is a third party docx -> pdf converter.

  • Anonymous
    October 17, 2008
    on Thursday, October 16, 2008 5:46 PM BrianJones said: "Franco, it's a bit too early to talk about Office 14 features." @brian this is strange.. because this [0] press release reads: "In addition, Microsoft has defined a road map for its implementation of the newly ratified International Standard ISO/IEC 29500 (Office Open XML). IS29500, [...T]he company plans to update that support in the next major version release of the Microsoft Office system, code-named 'Office 14'.” Are you meaning that the phrase "Microsoft has defined a road map" for IS29500 is not accurate? For the record ( may be i was unclear in my phrasing), my question was: "hi, could you tell us if the newly created Office 14 documents will be made to conform to the strict conformance IS 29500 specs ?" or better ( and clearer ): "does the roadmap defined according [0] include the strico conformance to IS 29500 specs?" Thanks for your clarification. [0] http://www.microsoft.com/Presspass/press/2008/may08/05-21ExpandedFormatsPR.mspx

  • Anonymous
    October 20, 2008
    I've fallen a few weeks behind on posting links to various articles and blog posts, so this post is a

  • Anonymous
    October 20, 2008
    Seth, Got it. I think Zeyad has a demo around this he pulled together at some point (I'll check). If so I'll make sure he posts it. franco, Sorry for confusing you. Not to confuse you more, but my answer is still the same. You're asking about user experience around a subset of the standard, and I said it's too early to talk about that. While we already support a large portion of 29500, we're going to work to improve that even more based on the changes made at the BRM. -Brian

  • Anonymous
    October 23, 2008
    This is in reference to the comment about merging documents using altChunk by Zeyad in http://blogs.msdn.com/brian_jones/archive/2008/10/06/open-xml-format-sdk-2-0.aspx I tried out merging two documents using, para.InsertAfterSelf(altChunk) While merging two documents, some parts of the merged(second) document are reformatted using the style of the first document, particularly the bulleting & numbering. I tried to fix this using quite a few options. Couple of these are mentioned below, Snippet 1 - DocumentFormat.OpenXml.Wordprocessing.Locked locked1 = doc.MainDocumentPart.StyleDefinitionsPart.Styles.Descendants<Locked>().Last(); locked1.Val = BooleanValues.False; doc.MainDocumentPart.Document.Save(); Snippet 2 - AltChunkProperties properties = new AltChunkProperties(); altChunk.AltChunkProperties = properties; MatchSource matchSource = new MatchSource(); matchSource.Val = BooleanValues.True; altChunk.AltChunkProperties.MatchSource = matchSource; None of these options work. Are there any other properties which need to be set apart from the ones mentioned above? Any help on this is highly appreciated. Thanks, Anand.

  • Anonymous
    November 14, 2008
    The comment has been removed

  • Anonymous
    January 28, 2009
    For the past few posts, I have been concentrating on showing you guys solutions to real world scenarios.