Working with Optional Elements and Attributes in LINQ to XML Queries
Often XML schemas allow for optional elements and attributes. When you write queries on these elements or attributes, you may be tempted to write code that does lots of testing for null. There is a better way to do this, laid out in this post. I covered this idiom in a previous post, but the main purpose of that post wasn’t to explain this idiom.
This blog is inactive.
New blog: EricWhite.com/blog
Blog TOCThe following XML document is a simplified variation of markup that you can find in Open XML word processing documents:
<document>
<body>
<p>
<r>
<t>Text of first para.</t>
</r>
</p>
<p>
<pPr>
<pStyleval="Heading1"/>
</pPr>
<r>
<t>Text of second para.</t>
</r>
</p>
</body>
</document>
The first paragraph doesn’t have a <pPr> element, whereas the second does. This is allowable in Open XML word processing documents. The first paragraph has the default style.
Our task is to write a query that returns the style name for each paragraph, but if the paragraph has no style name, then the paragraph has the default style. The code projects a collection of an anonymous type that contains the style name and the text. If the paragraph has the default style, the StyleName is set to null.
The approach where the code tests for null looks like this:
using System;
using System.Linq;
using System.Xml.Linq;
class Program
{
static string GetStyleName(XElement p)
{
XElement pPr = p.Element("pPr");
if (pPr != null)
{
XElement pStyle = pPr.Element("pStyle");
if (pStyle != null)
return (string)pStyle.Attribute("val");
}
return null;
}
static void Main(string[] args)
{
XElement root = XElement.Parse(
@"<document>
<body>
<p>
<t>Text of first para.</t>
</p>
<p>
<pPr>
<pStyle val='Heading1'/>
</pPr>
<t>Text of second para.</t>
</p>
</body>
</document>");
var paragraphs = root
.Element("body")
.Elements("p")
.Select(p => new
{
StyleName = GetStyleName(p),
Text = (string)p.Element("t")
});
foreach (var item in paragraphs)
Console.WriteLine(item);
}
}
This works just fine, and yields the expected results:
{ StyleName = , Text = Text of first para. }
{ StyleName = Heading1, Text = Text of second para. }
Beyond making the code harder to read, this approach introduces two additional points of possible failure. If I had neglected to write the code to test for null, my code would throw an exception.
There is another way to write this query, which is to use the Elements and Attributes extension methods that operate on IEnumerable<XElement>.
using System;
using System.Linq;
using System.Xml.Linq;
class Program
{
static void Main(string[] args)
{
XElement root = XElement.Parse(
@"<document>
<body>
<p>
<t>Text of first para.</t>
</p>
<p>
<pPr>
<pStyle val='Heading1'/>
</pPr>
<t>Text of second para.</t>
</p>
</body>
</document>");
var paragraphs = root
.Element("body")
.Elements("p")
.Select(p => new
{
StyleName = (string)p.Elements("pPr").Elements("pStyle")
.Attributes("val").FirstOrDefault(),
Text = (string)p.Element("t")
});
foreach (var item in paragraphs)
Console.WriteLine(item);
}
}
This also yields the same results, and doesn’t contain the two points of possible failure.
Here’s how this code works. In the snippet below, the highlighted code evaluates to a collection of XElement objects. Notice that I used the Elements method, not the Element method, even though I know that there could only be zero or one <pPr> elements. The highlighted code returns a collection of either zero or one items.
StyleName = (string)p.Elements("pPr").Elements("pStyle")
.Attributes("val").FirstOrDefault(),
The Elements extension method yields all child elements with the given name for each and every element in the source collection. In the snippet below, the highlighted code will return either one XElement object (if there was a <pPr> element), or an empty collection, if there wasn’t a <pPr> element:
StyleName = (string)p.Elements("pPr").Elements("pStyle")
.Attributes("val").FirstOrDefault(),
Next, the code ‘dots’ into the Attributes extension method. Again, the Attributes extension method is happy to take a collection of elements as source. If an empty collection is passed to the Attributes extension method, it also returns an empty collection:
StyleName = (string)p.Elements("pPr").Elements("pStyle")
.Attributes("val").FirstOrDefault(),
The FirstOrDefault extension method either returns the first element in a collection, or it returns the default value for the type of items in the collection. The default value for all reference types (which XAttribute and XElement are) is null. In this case, FirstOrDefault will either return the one “val” attribute, or it will return null.
Finally we cast this one value (either null or an XAttribute) to string. String is a nullable type, of course, and the explicit conversion operator (the cast operator for XAttribute or XElement) is happy to take null, and return null. If there is no <pPr> element, StyleName will be set to null. If there is a pPr element, a pStyle element, and a val attribute, then StyleName will be set to the value of the attribute.
This is where the other nullable CLR types (int?, bool?, double?, etc.) come in handy. If we want to get the value of an optional element or attribute, and we know that the value is an integer or double, or whatever, instead of casting to string, we can cast to any of the other nullable types. The same semantics apply.
Interestingly, this is also efficient. Here’s why:
- The LINQ to XML axes use deferred execution.
- FirstOrDefault starts the process of materialization, requesting the first item in the collection from the Attributes extension method.
- It, in turn, requests the first item from the Elements(“pStyle”) call.
- It, in turn, requests the first element from the XElement.Elements method, which finally yields up an XElement (or returns an empty collection if there isn’t one).
- This one element is yielded up to the Elements(“pStyle”) extension method.
- This one element is yielded up to the Attributes(“val”) extension method.
- The one attribute is yielded up to the FirstOrDefault extension method, which due to its semantics, “short circuits” the query, and never requests another item from its source.
The net result of this is that this idiom is efficient, reduces points of possible failure, and is shorter. Once I started using this idiom regularly, it became quite natural. Once one of the LINQ architects described this idiom, I started using it, and mostly don't write code in the other style. So I’m curious – question for all you LINQ users out there, do you use this idiom?
Comments
Anonymous
July 23, 2009
Thank you for this post. This helps me out with my learning curve aroung LINQ for XML.Anonymous
January 22, 2010
Very cool. I did not pay attention to the (s)' and it is all make differences. Element("name1").Element("name2") might throw an exceptions but the Elements("name1").Elements("name2").FirstOrDefault() would never throw an exception. Very cool.Anonymous
January 27, 2010
I have been working with Linq for a while now in vb.net and hadn't come across this method before, i had been unable to resolve how to extract optional elements from an xml doc till i found your article. this snippet (in VB.NET works perfect Module Module1 Public Class Form1 Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load Dim element = <xml> <object> <word>someword</word> <number>7</number> <another_word>hey</another_word> </object> <object> <word>nothing%20else</word> </object> </xml> Dim example = From item In element...<object> _ Select _ Word = item.Elements("word"), _ Number = item.Elements("number"), _ AnotherWord = item.Elements("another_word") Dim Results As New List(Of Test) For Each item In example Dim t As New Test t.Word = item.Word.Value t.Number = CInt(item.Number.Value) t.AnotherWord = item.AnotherWord.Value Results.Add(t) Next End Sub End Class Public Class Test Public Word As String Public Number As Integer Public AnotherWord As String End Class End Module thanks again for this valuable and hard to obtain information.Anonymous
January 28, 2010
Hi Craig, You are right, the same technique works in VB as well as in C#. You can simplify your example somewhat by using the built-in VB axes. Also, I'd suggest doing your projection into the List<Test> directly in the query (you need to parenthesize it to dot into ToList): Module Module1 Sub Main() Dim element = <xml> <object> <word>someword</word> <number>7</number> <another_word>hey</another_word> </object> <object> <word>nothing%20else</word> </object> </xml> Dim example = (From item In element...<object> _ Select New Test With { _ .Word = item.<word>.Value, _ .Number = item.<number>.Value, _ .AnotherWord = item.<another_word>.Value}).ToList For Each item In example Console.WriteLine(item.Number) Next End Sub Public Class Test Public Word As String Public Number As Integer Public AnotherWord As String End Class End Module If you prefer not to use the parentheses, you can do it like this: Dim example = From item In element...<object> _ Select New Test With { _ .Word = item.<word>.Value, _ .Number = item.<number>.Value, _ .AnotherWord = item.<another_word>.Value} Dim example2 = example.ToList For Each item In example2 Console.WriteLine(item.Number) Next One modification that also shows the approach using optional elements is that you can filter on optional elements like this: Dim example = From item In element...<object> _ Where item.<number>.Value = "7" _ Select New Test With { _ .Word = item.<word>.Value, _ .Number = item.<number>.Value, _ .AnotherWord = item.<another_word>.Value} Dim example2 = example.ToList
-Eric