Share via


Using LINQ to XML Annotations - tracking line numbers

[updated to escape the code so that it displays properly in HTML, and so that it gracefully handles input with an XML declaration]

Several people have asked for a feature in LINQ to XML that would keep track of the line number in an XML data source from which each node was parsed.  We have resisted, partly because there doesn't seem to be a mainstream use case for this feature, and partly because the minimialist design philosophy behind LINQ: simple, mainstream scenarios should be supported out of the box, whereas more sophisticated use cases can be supported via the various extension mechanisms.  The code example below shows how to use C# 3.0 extension methods and LINQ to XML annotations to do this job. 

The tricky part of this code is the  LoadWithLineInfo method, which sets up the XmlReader; and the LoadNode method, which figures out what the reader returned, constructs the appropriate type of XLinq object to hold that result, handles the line number information annotation, updates the XLinq tree, and calls the reader again.  The good news is once you understand the logic here, you should be able to write other extension methods that preserve information in a source file that does not fit neatly into an XLinq tree.  For example, we believe this basic pattern can be used to extend LINQ to XML to be more DTD-aware, e.g. preserving the entity references or noting attribute values that were set from the DTD default rather than explicitly in the XML source.  This type of information could be stored as annotations by a similar customized load method and referenced by an analogous save method. 

Note: This sample requires the May 2006 LINQ CTP to work. In the May CTP, annotations are only supported on XDocument and XELement objects, but in the next public release of LINQ to XML, it will be possible to attach annotations to almost every type of XLinq object, including attributes and text nodes. 

Try this out; the sample program itself just loads some XML from a string and prints out the line number information where the elements were found.  You might play around with the XML source and the elements whose line number information can be displayed, or you could tweak the program to read from a file specified on the command line.   Let me know what is confusing and I'll try to clarify.

  
 using System;
using System.IO;
using System.Xml;
using System.Xml.XLinq;

namespace System.Xml.XLinq.Extension
{    

    /// <summary>
    /// Sample program to illustrate use of the line number extensions.  It reads an 
    /// XML document from a string, but could be easily modified to open a reader o
    /// </summary>
    public class Program
    {
        static void Main(string[] args) {
            string markup = @"
<root>            
    <e a='value1'/>
    <f b='value2'/>
</root>
";
            XDocument document = new XDocument();
            document.LoadWithLineInfo(XmlReader.Create(new StringReader(markup)));
            Console.WriteLine(document.Element("root").Element("e").GetLineInfo());
            Console.WriteLine(document.Element("root").Element("f").GetLineInfo());
        }
    }
    /// <summary>
    /// The application-defined class to be attached as an annotation.  This particular class 
    /// keeps track of the line number and character position at which an element was found
    /// in the XML source.
    /// </summary>
    public class LineInfo 
    {
        int number;
        int position;
        
        public LineInfo(int number, int position) {
            this.number = number;
            this.position = position;
        }
        
        public int Number { 
            get { return number; }
        }
        
        public int Position {
            get { return position; }
        }
        
        public override string ToString() {
            return "Line #" + number + ", Char #" + position;
        }
     }
    /// <summary>
    /// Some extension methods added to the System.Xml.XLinq namespace to support
    /// line number annotations.
    /// </summary>
    public static class Extension
    {
            public static LineInfo GetLineInfo(this XElement element) {
            return element.GetAnnotation<LineInfo>();
        }
        
        public static void SetLineInfo(this XElement element, LineInfo lineInfo) {
            element.AddAnnotation(lineInfo);
        }
        /// <summary>
        /// A version of the XLinq Load() method that annotates the tree it loads with 
        /// information on where in the XML file an element was found.  
        /// </summary>
        /// <param name="document">An XDocument to populate</param>
        /// <param name="reader">An XmlReader setup to read from a data source</param>
        public static void LoadWithLineInfo(this XDocument document, XmlReader reader) {
            if (reader == null) throw new ArgumentNullException();
            IXmlLineInfo lineInfo = reader as IXmlLineInfo;
            if (lineInfo == null) throw new ArgumentException();                                  
            if (reader.ReadState != ReadState.Interactive) {
                if (!reader.Read()) return;
            }             
            XNode node = null;
            while ((node = LoadNode(reader, lineInfo)) != null) {
                document.Add(node);
                if (!reader.Read()) return;
            }
        }
        /// <summary>
        /// Reads an XLinq node from an XmlReader, annotating it with line number information
        /// </summary>
        static XNode LoadNode(XmlReader reader, IXmlLineInfo lineInfo) {
            XNode node = null;
            XElement parent = null;
            do {
                switch (reader.NodeType) {
                    case XmlNodeType.Element:
                        XElement element = new XElement(XName.Get(reader.LocalName, reader.NamespaceURI));
                        if (reader.MoveToFirstAttribute()) {
                            do {
                                element.Add(new XAttribute(XName.Get(reader.LocalName, reader.NamespaceURI), reader.Value));
                            } while (reader.MoveToNextAttribute());
                            reader.MoveToElement();
                        }
                        if (lineInfo.HasLineInfo()) {
                            element.SetLineInfo(new LineInfo(lineInfo.LineNumber, lineInfo.LinePosition));
                        }                        
                        if (!reader.IsEmptyElement) {
                            if (parent != null) {
                                parent.Add(element);
                            }
                            parent = element;
                            continue;
                        }
                        else {
                            node = element;
                        }
                        break;
                    case XmlNodeType.EndElement:
                        if (parent == null) return null;
                        if (parent.IsEmpty) {
                            parent.Add(string.Empty);
                        }
                        if (parent.Parent == null) return parent;
                        parent = parent.Parent;
                        continue;
                    case XmlNodeType.Text:
                    case XmlNodeType.SignificantWhitespace:
                    case XmlNodeType.Whitespace:
                        node = new XText(reader.Value);
                        break;
                    case XmlNodeType.CDATA:
                        node = new XText(reader.Value, TextType.CData);
                        break;
                    case XmlNodeType.Comment:
                        node = new XComment(reader.Value);
                        break;                        
                    case XmlNodeType.ProcessingInstruction:
                        node = new XProcessingInstruction(reader.Name, reader.Value);
                        break;
                    case XmlNodeType.DocumentType:
                        node = new XDocumentType(reader.LocalName, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                        break;
                    case XmlNodeType.EntityReference:
                        reader.ResolveEntity();
                        continue;
                     case XmlNodeType.XmlDeclaration:
                    case XmlNodeType.EndEntity:
                        continue;                        
                    default:
                        throw new InvalidOperationException();
                }
                if (parent == null) return node;
                parent.Add(node);
            } while (reader.Read());
            return null;
        }
    }
       
}

Comments

  • Anonymous
    September 17, 2006
    The comment has been removed
  • Anonymous
    September 18, 2006
    The comment has been removed
  • Anonymous
    September 18, 2006
    The comment has been removed
  • Anonymous
    September 19, 2006
    You can put the sample data in a file and load directly from the file with the reader.  For example, if the data is in a file "sample.xml" in the project directory:

    document.LoadWithLineInfo(XmlReader.Create("../../sample.xml"));

    The point of using StringReader in the example was simply to make it self-contained in a single file.
  • Anonymous
    September 19, 2006
    Dear Mike,

    Thanks, it works!  Anyway, I have added an additional case for XmlDeclaration that comes before CDATA in the LoadNode function of the Extension class, so that the XML data with XML declaration can goes through.


    Regards

    Hui Ming
  • Anonymous
    September 21, 2006
    Thanks for the clarification, chionhhm!  In re-reading your post I can now see what you meant.
  • Anonymous
    April 20, 2007
    As S. Somasegar announced , Orcas Beta 1 is ready to ship and will be generally available for download