Jaa


How to: Use Annotations to Transform LINQ to XML Trees in an XSLT Style

(Update: June 23, 2008: I've updated and improved on this technique in this blog post)

Introduction

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCAnnotations can be used to facilitate transforms of an XML tree.

Some XML documents are "document centric with mixed content." With such documents, you don't necessarily know the shape of child nodes of an element. For instance, a node that contains text may look like this:

[xml]

<text>A phrase with <b>bold</b> and <i>italic</i> text.</text>

For any given text node, there may be any number of child <b> and <i> elements. This approach extends to a number of other situations: i.e. pages that can contain a variety of child elements, such as regular paragraphs, bulleted paragraphs, and bitmaps. Cells in a table may contain text, drop down lists, or bitmaps.

If you want to transform elements in a tree where you don't necessarily know much about the children of the elements that you want to transform, then this approach that uses annotations is an effective approach.

The summary of the approach is:

· First, annotate elements in the tree with a replacement element.

· Second, iterate through the entire tree, creating a new tree where you replace each element with its annotation.

In detail, the approach consists of:

· Execute one or more LINQ to XML queries that return the set of elements that you want to transform from one shape to another. For each element in the query, add a new T:System.Xml.Linq.XElement object as an annotation to the element. This new element will replace the annotated element in the new, transformed tree. This is quite simple code to write, as demonstrated by the example.

· The new element that is added as an annotation can contain new child nodes; it can form a sub-tree with any desired shape.

· There is a special rule: If a child node of the new element is in a different namespace, a namespace that is made up for this purpose (in this example, the namespace is https://www.microsoft.com/LinqToXmlTransform), then that child element is not copied to the new tree. Instead, if the namespace is the above mentioned special namespace, and the local name of the element is ApplyTransforms, then the child nodes of the element in the source tree are iterated, and copied to the new tree (with the exception that annotated child elements are themselves transformed according to these rules).

This is somewhat analogous to the specification of transforms in XSL. The query that selects a set of nodes is analogous to the XPath expression for a template. The code to create the new T:System.Xml.Linq.XElement that is saved as an annotation is analogous to the sequence constructor in XSL, and the ApplyTransforms element is analogous in function to the xsl:apply-templates element in XSL.

One advantage to taking this approach - as you formulate queries, you are always writing queries on the unmodified source tree. You need not worry about how modifications to the tree affect the queries that you are writing.

Transforming a Tree

This first example renames all Paragraph nodes to para.

[c#]

XElement root = XElement.Parse(@"

<Root>

    <Paragraph>This is a sentence with <b>bold</b> and <i>italic</i> text.</Paragraph>

    <Paragraph>More text.</Paragraph>

</Root>");

 

// replace Paragraph with p

foreach (var el in root.Descendants("Paragraph"))

    el.AddAnnotation(

        new XElement("para",

            // same idea as xsl:apply-templates

            new XElement(xf + "ApplyTransforms")

        )

    );

 

XElement newRoot = XForm(root);

 

Console.WriteLine(newRoot);

 

This example produces the following output:

[xml]

<Root>

  <para>This is a sentence with <b>bold</b> and <i>italic</i> text.</para>

  <para>More text.</para>

</Root>

A More Complicated Transform

The following example queries the tree and calculates the average and sum of the Data elements, and adds them as new elements to the tree.

[c#]

XElement data = new XElement("Root",

    new XElement("Data", 20),

    new XElement("Data", 10),

    new XElement("Data", 3)

);

 

// while adding annotations, you can query the source tree all you want,

// as the tree is not mutated while annotating.

data.AddAnnotation(

    new XElement("Root",

        new XElement(xf + "ApplyTransforms"),

        new XElement("Average",

            String.Format("{0:F4}",

                data

                .Elements("Data")

                .Select(z => (Decimal)z)

                .Average()

            )

        ),

        new XElement("Sum",

            data

            .Elements("Data")

            .Select(z => (int)z)

            .Sum()

        )

    )

);

 

Console.WriteLine("Before Transform");

Console.WriteLine("----------------");

Console.WriteLine(data);

Console.WriteLine();

Console.WriteLine();

 

XElement newData = XForm(data);

 

Console.WriteLine("After Transform");

Console.WriteLine("----------------");

Console.WriteLine(newData);

 

This example produces the following output:

Before Transform

----------------

<Root>

  <Data>20</Data>

  <Data>10</Data>

  <Data>3</Data>

</Root>

 

 

After Transform

----------------

<Root>

  <Data>20</Data>

  <Data>10</Data>

  <Data>3</Data>

  <Average>11.0000</Average>

  <Sum>33</Sum>

</Root>

Effecting the Transform

A small function, XForm, creates a new transformed tree from the original, annotated tree.

The pseudo code for the function is quite simple:

The function takes an XElement as an argument and returns an XElement.

If an element has an XElement annotation, then

    Return a new XElement

        The name of the new XElement is the annotation element's name.

        All attributes are copied from the annotation to the new node.

        All child nodes are copied from the annotation, with the

            exception that the special node xf:ApplyTransforms is

            recognized, and the source element's child nodes are

            iterated. If the source child node is not an XElement, it

            is copied to the new tree. If the source child is an

            XElement, then it is transformed by calling this function

            recursively.

If an element is not annotated

    Return a new XElement

        The name of the new XElement is the source element's name

        All attributes are copied from the source element to the

            destination's element.

        All child nodes are copied from the source element.

        If the source child node is not an XElement, it is copied to

            the new tree. If the source child is an XElement, then it

            is transformed by calling this function recursively.

The implementation of this function in this example is only about 50 lines long.

XForm Function

The following code is a complete example that includes the XForm function. It includes most of the typical uses of this type of transform:

[c#]

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Xml;

using System.Xml.Linq;

Comments

  • Anonymous
    December 09, 2007
    So is there is no direct XSLT mapping to XLINQ? As in, we will have to use some custom function like the one illustrated above in order to implement XSLT kind of functionality using XLINQ.

  • Anonymous
    December 10, 2007
    There is an easy way to use XSLT to transform a LINQ to XML tree. See the following topic in the docs for how to do this: http://msdn2.microsoft.com/en-us/library/bb675186(VS.90).aspx However, I believe that the annotation approach will outperform XSLT significantly. I haven't developed any metrics, though. I plan to do so when I get time.

  • Anonymous
    October 05, 2008
    How can i retrieve only the nodes that i want?

  • Anonymous
    October 06, 2008
    @JF, See this post for an updated version of this technique:  http://blogs.msdn.com/ericwhite/archive/2008/06/23/using-annotations-to-transform-linq-to-xml-trees-in-an-xslt-style-improved-approach.aspx I'm not exactly sure what you mean - how to retrieve just the nodes you want.  The LINQ query that selects nodes can be quite detailed to select a very specific set of nodes to annotate.  I'd need a bit more information before I could respond to your question. -Eric