Writing Robust LINQ to XML Code that Performs Well

Article
07/14/2008

In the last three posts, in addition to the information regarding how we want to alter the markup in an Open XML document, I've made a few observations about how to write LINQ to XML code when modifying an XML tree in such a way that it becomes harder to introduce bugs. In addition, I've also mentioned a couple of other points that you can take into consideration - you can write code that performs better if you know a bit about how LINQ to XML operates under the covers.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCIn Using LINQ to XML to Accept Revisions, I mention a couple of approaches to modifying/transforming an XML tree, and tell why I picked the approach I did for the particular problem I'm solving.

In Using LINQ to XML to Remove Personal Information, I go through a couple of idioms in detail, showing how to write robust code to modify an XML tree.

In Using LINQ to XML to Remove Comments, I discuss one issue to take into consideration so that you can write queries that perform well.

While my primary motivation in writing these posts is to show how to modify the Open XML markup to acheive the desired goals, there is some good information for LINQ to XML developers. I felt that this might be lost behind the focus on Open XML markup, so I'm calling it out specifically here.

Comments

Anonymous
July 13, 2008
One trick is to never, ever write queries or updates which retrieve or insert siblings before some particular element. In other words, avoid XNode.AddBeforeSelf(), XNode.ElementsBeforeSelf(), XNode.PreviousNode. This is because XNodes in an XContainer form a singly linked list, so only forward traversal is efficient, while backward requires restarting from the beginning of the container. Because of this, IsBefore() and IsAfter() are not particularly efficient, either... Interestingly enough, XPathDocument does not have this problem.
Anonymous
July 13, 2008
@int19h, That is a good point. I think that the design decision was to optimize for the most common scenarios, and keep the working set small. -Eric
Anonymous
July 14, 2008
I understand the reasoning, but it's a real pity that MSDN docs don't say a word on this, and have outright deceiving samples (if you look the one for PreviousNode, it uses it in a while-loop to traverse nodes in reverse document order - that's O(n^2)!). I've created a ticket for that at VS Connect: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=355102
Anonymous
July 15, 2008
Well, that would be my fault, since I wrote the LINQ to XML documentation. :P Actually, I can't remember, but I vaguely recollect that the fact that nodes were forward only linked was an implementation detail that could in theory change in the future, so should not be put in the docs. But what I can tell you is that I wasn't focused on node traversal as a primary scenario. Instead, I wanted to teach people about using queries, and the functional approach. There are a couple of scenarios where a developer will want to do node traversal, but in those cases, the developer will be aware that nodes are forward linked. I'm not sure what a good example would be for previous node. Suggestions? -Eric
Anonymous
July 15, 2008
Eric White on Linq and Open XML. Eric White has been posting some great code samples lately, including
Anonymous
July 22, 2008
> Actually, I can't remember, but I vaguely recollect that the fact that nodes were forward only linked was an implementation detail that could in theory change in the future, so should not be put in the docs. You might not want to put the exact algorithmic complexity in the docs, but I don't see how it would hurt to write something like "the complexity of this operation is not guaranteed to be O(1)" in Remarks section. > But what I can tell you is that I wasn't focused on node traversal as a primary scenario. Instead, I wanted to teach people about using queries, and the functional approach. There are a couple of scenarios where a developer will want to do node traversal, but in those cases, the developer will be aware that nodes are forward linked. XLINQ is indeed mostly about queries, but this particular example I linked to was certainly about node traversal. > I'm not sure what a good example would be for previous node. Suggestions? Ironically enough, reverse traversal is perhaps the only thing it's actually useful for ... so no other ideas here, sorry. In reality, as it stands, any use of PreviousNode in a loop (even an implicit one, such as a LINQ query) is suspect.
Anonymous
July 23, 2008
Hi int19h, Well, FWIW, I absolutely agree with you. It should be mentioned. I'll send an email to the current XML doc person. -Eric
Anonymous
July 24, 2008
The comment has been removed

Share via

Writing Robust LINQ to XML Code that Performs Well

Comments

Additional resources