Generating an XPath expression to find a LINQ to XML Node

In a number of places in the docs, I present code that finds nodes in the XML tree. Sometimes there are easy ways to describe the results of a query, but sometimes I wanted to describe the results of a query by specifically identifying exactly which nodes are selected by a query. Having a string that specifically identifies a node makes it easy to write sample code that selects specific nodes and then shows the exact results.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCWell, we already have a syntax that allows us to identify a specific node in an XML tree: XPath

Further, there are extension methods in System.Xml.XPath that allow us to evaluate an XPath expression, returning the node(s) that the expression selects.

So, I wrote a method, GetXPath, implemented as an extension method on System.Xml.Linq.XObject, that returns an XPath expression that identifies the node in the XML tree. The implementation is fairly complete - for instance, it generates an XPath expression that contains namespace prefixes when the nodes are in a namespace.

The extension methods in System.Xml.XPath allow us to validate that the XPath expressions that we generate select the exact same node as was used to generate the XPath expression.

I also wrote another useful axis method, DescendantXObjects, which returns an IEnumerable<XObject> that contains all child nodes, and all attributes of any nodes.

Then, I wrote a method, DumpXPaths, that iterates through the descendant XObjects and prints the XPath for every node to the console. This method also validates that the node returned by evaluating the XPath expression is the same node as was used to generate the XPath expression. It also validates that one and only one node is returned when evaluating the XPath expression. For example, the following code creates a simple XML tree, and calls DumpXPaths:

XDocument root = XDocument.Parse(@"<Root AnAttribute='att-value'>
<?xml-stylesheet type='text/xsl' href='hello.xsl'?>
<Child1 AnotherAttribute='abc'>text</Child1>
<!--This is a comment.-->
</Root>");
DumpXPaths(root);

This code produces the following output:

.

/Root

/Root/@AnAttribute

/Root/processing-instruction()

/Root/Child1

/Root/Child1/@AnotherAttribute

/Root/Child1/text()

/Root/comment()

The DumpXPaths method can also take an XmlNamespaceManager, which allows the code that validates the XPath expression to validate expressions that contain namespace prefixes. 

The XPath expressions generated by this method work when evaluating in the context of an XDocument, not an XElement. If you parse into an XElement, the root node is the XElement, but if you parse into an XDocument, the root XElement node is a child of the XDocument. The generated XPath expressions reflect this.

If the GetXPath method returns null, then the method did not generate an XPath expression to select the node. This is, AFAIK, only true for white space text nodes that are children of a document; such nodes are not part of the XPath object model, so it's not possible to generate an XPath expression to select them.

Here is the entire working program to show the XPath expressions for every node in an XML tree. You can get the PurchaseOrders.xml document from the documentation, or you can change the code to dump the nodes for your own XML tree:

using System;

using System.Diagnostics;

using System.Collections;

using System.Collections.Generic;

using System.Text;

using System.Linq;

using System.Xml;

using System.Xml.Linq;

using System.Xml.XPath;

 

namespace LinqToXmlExample

{

    public static class MyExtensions

    {

        private static string GetQName(XElement xe)

        {

   string prefix = xe.GetPrefixOfNamespace(xe.Name.Namespace);

            if (xe.Name.Namespace == XNamespace.Blank || prefix == null)

                return xe.Name.LocalName.ToString();

            else

                return prefix + ":" + xe.Name.LocalName.ToString();

        }

 

        private static string GetQName(XAttribute xa)

        {

            string prefix =

                xa.Parent.GetPrefixOfNamespace(xa.Name.Namespace);

            if (xa.Name.Namespace == XNamespace.Blank || prefix == null)

                return xa.Name.ToString();

            else

                return prefix + ":" + xa.Name.LocalName;

        }

 

        private static string NameWithPredicate(XElement el)

        {

  if (el.Parent != null && el.Parent.Elements(el.Name).Count() != 1)

                return GetQName(el) + "[" +

                    (el.ElementsBeforeSelf(el.Name).Count() + 1) + "]";

            else

                return GetQName(el);

        }

 

        public static string StrCat<T>(this IEnumerable<T> source,

            string separator)

        {

            return source.Aggregate(new StringBuilder(),

                       (sb, i) => sb

                           .Append(i.ToString())

                           .Append(separator),

                       s => s.ToString());

        }

 

        public static string GetXPath(this XObject xobj)

        {

            if (xobj.Parent == null)

            {

                XDocument doc = xobj as XDocument;

                if (doc != null)

                    return ".";

                XElement el = xobj as XElement;

                if (el != null)

                    return "/" + NameWithPredicate(el);

                XText xt = xobj as XText;

                if (xt != null)

                    return null;

                    //

                    //the following doesn't work because the XPath data

                    //model doesn't include white space text nodes that

                    //are children of the document.

                    //

                    //return

                    // "/" +

                    // (

                    // xt

                    // .Document

                    // .Nodes()

   // .OfType<XText>()

                    // .Count() != 1 ?

                    // "text()[" +

                    // (xt

                    // .NodesBeforeSelf()

                    // .OfType<XText>()

                    // .Count() + 1) + "]" :

                    // "text()"

                    // );

                    //

                XComment com = xobj as XComment;

                if (com != null)

                    return

                        "/" +

                        (

                            com

                            .Document

                            .Nodes()

                            .OfType<XComment>()

  .Count() != 1 ?

                            "comment()[" +

                            (com

                            .NodesBeforeSelf()

                            .OfType<XComment>()

                            .Count() + 1) +

                            "]" :

                          &nbsp

Comments

  • Anonymous
    December 17, 2007
    Not sure why to use Linq to Xml for this. It can be done with a pretty short and more elegant Xslt stylesheet <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <xsl:value-of select="'/&#xA;'" /> <xsl:apply-templates select="node()" /> </xsl:template> <xsl:template match="node()" priority="1" > <xsl:param name="parent-XPath" /> <xsl:variable name="current-XPath" select="concat($parent-XPath, '/', name())" /> <xsl:value-of select="concat($current-XPath, '&#xA;')" /> <xsl:if test="boolean(node() | @)"> <xsl:apply-templates select="node() | @"> <xsl:with-param name="parent-XPath" select="$current-XPath" /> </xsl:apply-templates> </xsl:if> </xsl:template> <xsl:template match="@*"> <xsl:param name="parent-XPath" /> <xsl:value-of select="concat($parent-XPath, '/@', name(), '&#xA;')" /> </xsl:template> <xsl:template match="text()" priority="2"> <xsl:param name="parent-XPath" /> <xsl:value-of select="concat($parent-XPath, '/text()', '&#xA;')" /> </xsl:template> <xsl:template match="comment()" priority="2"> <xsl:param name="parent-XPath" /> <xsl:value-of select="concat($parent-XPath, '/comment()', '&#xA;')" /> </xsl:template> <xsl:template match="processing-instruction()" priority="2"> <xsl:param name="parent-XPath" /> <xsl:value-of select="concat($parent-XPath, '/processing-instruction()', '&#xA;')" /> </xsl:template> </xsl:stylesheet> Cheers Pawel

  • Anonymous
    December 17, 2007
    The comment has been removed

  • Anonymous
    January 18, 2008
    XSLT is also not a structured procedural language, and has many limitations, one of which is performance

  • Anonymous
    February 14, 2008
    I agree with Pawel.  I think elegance AND readability of the code is more important here.  yeah, sure you can argue that LINQ it's milliseconds faster, but look at that code... as for XSLT being "slower" (which i think is debatable)...it's not dog slow if you're going to use XsltCompiledTransform object.   and I wonder what's the connection between XSLT's performance to it being "not a structured procedural language."

  • Anonymous
    February 14, 2008
    Actually, the point is not regarding elegance or readability of the code, nor is the point perf. The point is that when using LINQ to XML, in many cases, you assemble an XML tree, and you want to identify all of the nodes in the tree.  How do you do that?  You need a string or some means to identify each node.  So the appropriate string is an XPath expression.  It would be rather slow to fire off an XSLT transform to generate an XPath expression that you are using just identify a node.