Traversing in the Open XML DOM
For the past few posts, I have been concentrating on showing you guys solutions to real world scenarios. Today, I am going to change pace a bit and continue Ali's discussion on the basics of the Open XML SDK. In this post I am going to cover the basic techniques of traversing in the Open XML DOM tree using the Open XML SDK.
Again, it's just XML...
We designed the Open XML Low Level DOM to be an XML wrapper of the Open XML schemas. In other words, this component of the SDK allows you to work directly with strongly typed objects and classes that represent the underlying XML nodes. Essentially, by using the SDK, you are still working with just XML.
Traditionally, there are several technologies which allow you to traverse XML files. When designing the Open XML SDK we wanted to include functionality from DOM and LINQ in order to bring you a rich set of functionality. Our Low Level DOM component tries to marry concepts from .NET's DOM API and LINQ to XML. Let's talk about some of these concepts.
Traversing Down the XML Tree
One of the most common tasks when reading XML files is to traverse below a particular XML node, like the root. You would usually want to traverse downward if you are looking for something specific. For example, suppose you wanted to change content for all table rows within a document. One way to accomplish this scenario is to traverse downward starting at the document root element finding table and table row elements until you reach the end of the document. The Open XML SDK makes this task very easy.
As mentioned in Ali's post, all Open XML elements are based on the abstract OpenXMLElement class. The OpenXMLElement class provides the following methods and properties to traverse downwards within the DOM:
- OpenXMLElement FirstChild { get; }
- OpenXMLElement LastChild { get; }
- IEnumerable<OpenXmlElement> Elements ()
- OpenXmlElementList ChildElements { get; }
- IEnumerable<OpenXmlElement> GetEnumerator ()
- IEnumerable<OpenXmlElement> Descendants ()
There are a few core differences between each of these approaches. To demonstrate these differences let's say you want to read elements under a Table object, which contains table properties, table grid information, and row content:
<w:tbl>
<w:tblPr>
...
</w:tblPr>
<w:tblGrid>
...
</w:tblGrid>
<w:tr>
...
</w:tr>
...
<w:tr>
...
</w:tr>
</w:tbl>
FirstChild() and LastChild() methods are pretty straight forward, they return the first and last child, respectively.
The Elements() method allows you to read all the children elements of Table by using this code snippet:
foreach (OpenXmlElement el in tbl.Elements())
{
//DO SOMETHING
}
The ChildElements property is a bit special. Calling it directly like tbl.ChildElements will return a list of Open XML elements that are the children of a table. In this regard it is pretty equivalent to tbl.Elements(). What differentiates ChildElements from Elements() is that you can specify an index. For example, if you wanted the fourth child of the Table object you can call tbl.ChildElements[3].
Similarly to the other two approaches, the GetEnumerator() method provides support for iteration over all the child elements.
The Descendants() method, on the other hand, allows you to iterate over all children and descendants under a particular node. Taking this Table object as an example, calling Descendants() will allow you to not only see the table rows, but the cells within the rows as well.
Rather than iterating through all child elements, a more common scenario is finding child or descendant elements of a certain class type. For example, let's say you only want to find table row elements underneath the Table object. Instead of iterating through all the child elements and checking to see if the element is a table row, you can simply use either of the following methods:
- IEnumerable<T> Elements<T> ()where T : OpenXmlElement
- IEnumerable<T> Descendants<T> ()where T : OpenXmlElement
- T GetFirstChild<T> ()where T : OpenXmlElement
The first two methods will only return elements whose type is T or derived from T. Going back to the example above, you would use this code snippet to find all rows within a table:
foreach (TableRow tr in tbl.Elements<TableRow>())
{
//DO SOMETHING
}
The last method allows you to get the first child that matches a particular type. In the case of the table example, if I wanted to get the first row I could have used tbl.GetFirstChild<TableRow>().
Traversing Up the XML Tree
Similarly to traversing down the XML tree, you can traverse upwards. This functionality provides flexibility when traversing the DOM tree. For example, let's say you are searching for some text and you want to understand more of its context, like if the text is contained within a table or not. To accomplish this scenario you would need to traverse upwards. The Open XML SDK provides the following methods and properties to traverse upwards within the DOM:
- OpenXmlElement Parent
- IEnumerable<OpenXmlElement> Ancestors ()
- IEnumerable< T > Ancestors<T> () where T : OpenXmlElement
The Parent property will return the immediate parent element of a particular node. Calling Parent on a table row object will return the Table object.
The Ancestors() methods are very similar to the Descendants() methods., except that they traverse upwards instead of downwards.
Traversing Siblings within the XML Tree
What if you want to traverse the XML tree by exploring siblings? Well, the Open XML SDK can take care of this scenario as well. The Open XML SDK provides the following methods for traversing by siblings:
- OpenXmlElement PreviousSibling ()
- T PreviousSibling<T> () where T : OpenXmlElement
- OpenXmlElement NextSibling ()
- T NextSibling<T> () where T : OpenXmlElement
- IEnumerable<OpenXmlElement> ElementsBefore ()
- IEnumerable<OpenXmlElement> ElementsAfter ()
The first four methods return the closest sibling before or after the current element, while the rest of the methods enumerate all the sibling elements before or after the element under the same parent.
Summary
Hopefully this post shows some of the common ways of traversing the Open XML DOM tree. With this functionality you should be able to find what you are looking for in just a few lines of code.
Zeyad Rajabi
Comments
- Anonymous
January 29, 2009
Brian: This email below was initially sent to a MVP SharePoint developer who's blog I found, Patrick, and apparently he passed away in September. I found your blog as a reference from him, and hope you might be able to answer my question, and if not, provide me some direction toward someone who can. -Trey
I am working with WSS 3.0, and trying to use the XML Web Part to have it fetch data from a list, and then use XSL to format it for better/structured presentation. I have worked with the Application Templates Microsoft provides but they all seem to use the old DataView web part from back in the SPS2003/WSS2.0 days, and it is only accessible though SharePoint Designer. We do not use Designer. Situation: A custom list has entries for all our WAN network circuits out to their corresponding field sites, and a column contains their current status: up, down, or in maintenance. Our IT guys update this list as necessary. Rather than plopping the list’s self-created web part on a page tweaking a view, I would like to pull the content of that list into the WSS “XML Web Part” and format it – for instance, a red light pic next to the nodes down, green next to nodes up. Should be simple, right? My question is: How do I pull data from the list in xml format? More specifically, how do I provide the syntax or link necessary for the XML Web Part to fetch data from this custom list? I truly appreciate your time and help.
Anonymous
February 05, 2009
Quelques liens en cette fin de semaine et un article dans Programmez! : Zeyad vous présente une façonAnonymous
February 06, 2009
One of the more common scenarios related to a Wordprocessing document is the need to sanitize a document