IQueryable vs. IEnumerable

LINQ over DataSet contains two extension methods on DataTable that allow the data in a DataTable to accessed by LINQ. One exposes the data as IQueryable<DataRow> and the other as IEnumerable<DataRow>. Why is this, and what are these two interfaces?

 

When you read about LINQ, you see that IEnumerable<T> is the interface that is mentioned most. There is a good reason for this, as it is the base from which everything else draws from. As long as you have something that implements IEnumerable<T>, you can use LINQ.

 

IEnumerable doesn’t have the concept of moving between items, it is a forward only collection. It’s very minimalistic; something that most any data source can provide. Using only this minimal functionality, LINQ can provide all of these great operators. Yet, there is no free lunch.

 

Say you want to use LINQ to find duplicates in a huge collection, say 20000 (or more) items, perhaps in your DataSet. Now, by default, LINQ will iterate over the collection to find the duplicates, which is going to be an O(N) operation. But what if you have an index on the collection, something like a hashtable? If you can use the hashtable, suddenly you might have a O(log n) operation. Next you want to do a join on this same collection. If you are restricted to using the forward only model of IEnumerable, you’re going to get expensive in a hurry, especially if you start to combine operations.

 

IQueryable<T>

 

With something like a DataSet, you have the capability to make operations much more efficient. But if all you have is IEnumerable, how do you express this capability? In comes IQueryable, which adds two key methods: CreateQuery<T> and Execute<T>. Both of these methods accept an Expression, which is basically a new set of classes that allow you to express things like method calls, operations, lambda functions, etc, as a tree of expressions.

 

This isn’t just used for simple expressions, it’s used for the entire LINQ query! This means you can now look inside of the LINQ query, and do things with the expression tree, perhaps so your operation now uses the index. The user now gets the benefit of the index, without even needing to know about it!

Comments