How to Query Trees Using LINQ

Article
1/17/2024

http://c.statcounter.com/8988981/0/1070c577/1/A (directed) tree is one of the most common data types. But how do you use trees in LINQ? This article introduces a generic LINQ-Extension which solves the problem, gives an example and discusses alternatives.

Motivation

The starting point of this article was a simple question in the LINQ forum which has been unanswered for four years...

How can I select tree nodes independently from their nesting level?

So it might be worth to have a closer look at the task. Let's start with an example. A simple TreeNode class may look like this:

public class  TreeNode
{
    public string  Id { get; set; }
    public int Data  { get; set; }
    public List<TreeNode> Children  { get; private  set; }
 
    public TreeNode(string id, int data)
        : this(id, data, new  TreeNode[0])
    { }
 
    public TreeNode(string node, int data, params TreeNode[] children)
    {
        Id = node;
        Data = data;
        Children = new  List<TreeNode>(children);
    }
}

The test data are:

TreeNode rootNode =
    new TreeNode("node-0", 35,
        new TreeNode("node-1", 17,
            new TreeNode("node-2", 20)),
        new TreeNode("node-3", 19),
        new TreeNode("node-4", 5,
            new TreeNode("node-5", 25),
            new TreeNode("node-6", 40)));

↑ Return to Top

Solution

Implement a generic extension class which can traverse over arbitrary trees. The only precondition is that the node's children are stored as an IEnumerable. The selector lambda expression avoids hard-coding of the access path and makes the extension reusable.

public static  class LinqTreeExtension
{
    public static  IEnumerable<T> SelectNestedChildren<T>
        (this IEnumerable<T> source, Func<T, IEnumerable<T>> selector)
    {
        foreach (T item in source)
        {
            yield return  item;
            foreach (T  subItem in  SelectNestedChildren(selector(item), selector))
            {
                yield return  subItem;
            }
        }
    }
}

The extension method SelectNestedChildren traverses the tree depth-first, left to right.

↑ Return to Top

Usage Example

**Example 1: **Flattening the tree structure (without the root node)

var flattenedTreeWithoutRoot 
    = rootNode.Children
        .SelectNestedChildren(t => t.Children).ToList();

The resulting list enumerates the nodes with IDs "note-1" to "note-6". Please note: The root node is not contained in the resulting list. We'll fix this in the next example.

**Example 2: **Flattening the tree structure (including the root node)

var flattenedTreeIncludingRoot 
    = new  TreeNode[] {rootNode }
        .SelectNestedChildren(t => t.Children).ToList();

The query is executed on an array, which contains only the root node. The resulting list enumerates the nodes with IDs "note-0" to "note-6" as expected.

Example 3: Performing a LINQ query on all nested children

var result
    = new  TreeNode[] {  rootNode }
        .SelectNestedChildren(t => t.Children)
        .Where(t => t.Data > 30).ToList();

The resulting list contains two nodes with the IDs "node-0" and "node-6"

↑ Return to Top

Contradictions and alternatives

The proposed LINQ extension has significant pros:

The extension is applicable to all tree-structures which use an IEnumerable to hold the children.
It fits seamlessly into other LINQ operations.

However, in some cases you may want to consider an alternative:

Cyclic graph instead of a tree

**Problem: **The extension will loop forever if it runs into a cycle of the graph.
Solution: Modify the extension:

Add a HashSet<T> to the extension which keeps track of the already visited notes.
Your class T should override GetHashCode() and Equals()
The extensions have to skip already visited nodes.

Persisted tree in a database

Problem: If the tree is stored in a database (and not in memory), the extension produces unnecessary overhead.
Solution: All tree nodes reside in a database table which can be queried directly. The effect that they represent a logical tree or even a cyclic graph can be ignored. Using the Entity Framework the query looks like this:

var databaseResult = context.TreeNodes
    .Where(t => t.Data > 30).ToList();

However, at least two small changes should be applied to the sample class TreeNode:

The property Children should be changed to a virtual property to enable lazy loading.
The Entity Framework requires an additional parameterless constructor.

XML data structures

An XML data structure is a special case of a directed tree.

Problem: LINQ to XML has specialized on XML Documents.
Solution: Use LINQ to XML. It reduces your coding effort. You stick to a standard. You may also use other benefits from LINQ to XML. Example:

var xmlElements = XDocument.Load("myfile.xml").Descendants();
 
var result = xmlElements.Where(x => ...);

XDocument.Descendents() returns an IEnumerable<XElement> which can be used in a query directly.

Conclusion:
Using trees with LINQ is simple. However, there is no "one size fits all" solution.

↑ Return to Top

Other languages

This article is also available in other languages:

Baumstrukturen mit LINQ abfragen (de-DE)

Share via

How to Query Trees Using LINQ

Motivation

Solution

Usage Example

Contradictions and alternatives

Cyclic graph instead of a tree

Persisted tree in a database

XML data structures

Links

See Also

Other languages

Additional resources