How to implement IQueryable (Part 1)
In the Orcas timeframe, Microsoft will be supplying a couple of specialized flavors of Linq to address common data access scenarios. DLinq covers SQL servers and XLinq handles XML documents, but what about the countless other data sources out there that a user might want to interact with using Linq? For some, you can simply gather your data into CLR collection and make use of the default Linq experience. For example, if you wanted to find new *.exe files on your hard drive, you might use Linq to do something like this:
Dim newExe = From fileName In Directory.GetFiles( _
My.Computer.FileSystem.SpecialDirectories.MyDocuments, _
"*.exe", SearchOption.AllDirectories) _
Where (New FileInfo(fileName)).CreationTime > #6/30/2007# _
Select fileName
This is pretty cool, but it’s also very simple and doesn’t make for a very interesting blog post, so let’s go ahead and complicate things…
For many backend data sources out there in the IT world today, there exist query APIs and object models to represent the various “entities” found in the data. A good example of what I’m talking about can be found in none other than the Windows file system. Beyond using the methods in the .NET Framework to find files, Windows Desktop Search (available in Windows Vista or downloadable from https://www.microsoft.com/windows/desktopsearch/default.mspx) exposes an OLE DB Provider, allowing you to query its index. Wouldn’t it be nice if we could use somehow use Linq to query the index instead having to type out SQL queries as Strings to send along to OLE DB? Well, by creating a custom Linq provider, we can.
When it comes to creating a custom Linq provider, several informative blog posts exist on the Internet (see resources section), but I hope to detail a bit more of a practical “HowTo” on implementing a useful custom Linq provider.
Creating your own provider starts with implementing the IQueryable and IQueryProvider Interfaces. Often, a custom object model exists for representing data in OO form. For files on disk, we can use the FileInfo class. Here’s the code we’ve got so far:
Imports System.IO
Public Class WDSQueryObject
Implements IQueryable(Of FileInfo), IQueryProvider
End Class
If you type the above into VS, you’ll immediately notice that there are several methods on IQueryable and IQueryProvider that you must implement. In the following post(s), I will detail each one.
NOTES:
- The full source code for this project is available in under the resources section. It may be useful to download it and step through the code in a debugger as you read along.
- All of my code samples are based on Orcas Beta 2. The IQueryable interface has been refactored since the Beta 1 release.
CreateQuery
There are two CreateQuery methods on the IQueryProvider Interface. One returns a generic IQueryable(Of TElement) , and the other returns the non-generic IQueryable. For most implementations, you can probably just have the non-generic one call the generic one:
Public Function CreateQuery(ByVal expression As Expression) As IQueryable Implements IQueryProvider.CreateQuery
Return CreateQuery1(Of FileInfo)(expression)
End Function
For a simple query, CreateQuery will be called once for every for the “Where” clause and once for the “Select” clause. For example, if we had a query like:
Dim r = From file In index _
Where file.Name Like "%.exe" _
Select file.FullName
Then CreateQuery will be called once with the expression ‘file.Name Like "%.exe"’ and once with the expression ’file.FullName’. Here’s my skeleton implementation of CreateQuery to handle both cases:
Public Function CreateQuery1(Of TElement)(ByVal expression As Expression) As IQueryable(Of TElement) Implements IQueryProvider.CreateQuery
Dim querySource As IQueryable(Of TElement) = Nothing
Dim nodeType = expression.NodeType
Select Case nodeType
Case ExpressionType.Call
Dim m As MethodCallExpression = expression
Dim methodName = m.Method.Name
Select Case methodName
Case "Select"
' insert Select processing code
Case "Where"
' insert Where processing code
Case Else
Throw New NotSupportedException("Queries using '" & methodName & "' are not supported for this collection.")
End Select
Case Else
Throw New NotSupportedException("Creating a query from an expression of type '" & nodeType & "' is supported.")
End Select
Return querySource
End Function
You’ll notice that the expression we get in CreateQuery actually contains the information about who is calling us (i.e. Select, Where, etc), and we’ll use that information to process the rest of the query appropriately. For the filesystem example, the “Where” clause is the most interesting, so we’ll discuss that first. The signature for the Where extension method defined on Queryable is:
Public Shared Function Where(Of TSource)( _
ByVal source As IQueryable(Of TSource), _
ByVal predicate As Expression(Of System.Func(Of TSource, Boolean)) _
) As System.Linq.IQueryable(Of TSource)
If you look at the details of the expression tree, you’ll see that all of the information regarding the above signature is encoded (method call to to Where(Of FileInfo) , value for source, and quoted lambda for predicate).
Below is the ‘Where processing code’ I use to process the above expression. Let me explain what’s going on and then we’ll dive into the implementation.
m_query = New StringBuilder()
m_funclets = New List(Of KeyValuePair(Of String, Func(Of String)))()
Dim lambda As LambdaExpression = CType(m.Arguments(1), UnaryExpression).Operand
ExpandExpression(lambda.Body)
m_query.Insert(0, "SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (")
m_query.Append(")")
querySource = Me
You can pretty much ignore the "SELECT" string in the above code for now. It's simply a boiler plate Windows Desktop Search query with an incomplete "WHERE" clause (see links under resources section for more on querying WDS). What we'll be doing here is simply filling in the "WHERE" clause. The actual projection (SELECT) for our query will be handled in the second call to CreateQuery. As you can see in the Expression tree above, the first argument to Where (the ConstantExpression) is simply a reference to us (i.e. whatever is returned by the implementor of IQueryable.Expression). Since we are both the IQueryable and IQueryProvider implementor, we don’t need to worry about this argument. The second argument (the quoted lambda expression) is interesting, because we need to translate it into an SQL string. This is the heart of our IQueryable implementation—translating expressions from a Linq query into a set of instructions that can be used to retrieve data from the underlying data source. In my implementation, this translation is handled by ExpandExpression. It recursively traverses the expression tree and expands them into SQL strings. After it returns, m_query will contain the string we need to plug in to above the "WHERE" clause. Here’s the code for ExpandExpression:
Private Sub ExpandExpression(ByVal e As Expression)
Select Case e.NodeType
Case ExpressionType.And
ExpandBinary(e, "AND")
Case ExpressionType.Equal
ExpandBinary(e, "=")
Case ExpressionType.GreaterThan
ExpandBinary(e, ">")
Case ExpressionType.GreaterThanOrEqual
ExpandBinary(e, ">=")
Case ExpressionType.LessThan
ExpandBinary(e, "<")
Case ExpressionType.LessThanOrEqual
ExpandBinary(e, "<=")
Case ExpressionType.NotEqual
ExpandBinary(e, "!=")
Case ExpressionType.Not
ExpandUnary(e, "NOT")
Case ExpressionType.Or
ExpandBinary(e, "OR")
Case ExpressionType.Call
ExpandCall(e)
Case ExpressionType.MemberAccess
ExpandMemberAccess(e)
Case ExpressionType.Constant
ExpandConstant(e)
Case Else
Throw New NotSupportedException("Expressions of type '" & e.NodeType.ToString() & "' are not supported.")
End Select
End Sub
You’ll see that we simply go through all the different expression tree nodes we want to support and call the appropriate processing method (note also that the implementation is incomplete, but we cover most of the common types of expressions). Recursive processing continues until all the nodes in the expression tree have been evaluated. Let’s have a look at a simple query and walk through the processing methods that will be called. Given the following query:
Dim index As New WDSQueryObject
Dim cutoffDate = #6/28/2007#
Dim r = From file In index _
Where file.CreationTime > cutoffDate And _
file.Name Like "%.exe" _
Select file.FullName
The first method that will get called is ExpandBinary. This, in turn, calls ConcatBinary and combines the left and right hand expressions using the appropriate operator (in this case, “AND”).
Private Sub ExpandBinary(ByVal b As BinaryExpression, ByVal op As String)
ConcatBinary(b.Left, b.Right, op)
End Sub
Private Sub ConcatBinary(ByVal left As Expression, ByVal right As Expression, ByVal op As String)
ExpandExpression(left)
m_query.Append(" ")
m_query.Append(op)
m_query.Append(" ")
ExpandExpression(right)
End Sub
Processing the left hand side of the expression will end up calling ConcatBinary again (this time with the “>” operator) and will subsequently call ExpandMemberAccess. This is where the interesting processing begins.
Private Sub ExpandMemberAccess(ByVal m As MemberExpression)
Dim member = m.Member
Dim e = m.Expression
Select Case e.NodeType
Case ExpressionType.Parameter
' Parameter processing code
Case ExpressionType.Constant
' Constant processing code
Case Else
Throw New NotSupportedException("Accessing member '" & member.Name & "' is not supported in this context.")
End Select
End Sub
The first block that we’re going to hit is the ‘Parameter processing code’. In this context, ‘parameters’ are going to be the iteration variables of the query (the ‘file’ object). What we need to do with that information is translate the property access on the FileInfo object (file.CreationTime) into a Windows filesystem attribute name. Here’s the code I use to do that:
Private Function GetAttributeName(ByVal m As MemberInfo) As String
Dim name As String
Dim memberName = m.Name
Select Case memberName
Case "CreationTime"
name = "System.DateCreated"
Case "Name"
name = "System.FileName"
Case Else
Throw New NotSupportedException("Using the property '" & memberName & "' in filter expressions is not supported.")
End Select
Return name
End Function
As before, the implementation is incomplete, but adding translations for more properties should be very straightforward. A complete list of supported filesystem attributes can be found in the links under the resources section. The next block of code we’re going to hit is the ‘Constant processing code’. Here, we will need to intrepret the access to the variable cutoffDate. The code I use is as follows:
Dim valueName = "[value" & m_funclets.Count & "]"
Dim valueFunc As Func(Of String) = Nothing
Dim memberType = member.MemberType
If m.Type Is GetType(String) OrElse m.Type Is GetType(Date) Then
m_query.Append("'")
m_query.Append(valueName)
m_query.Append("'")
Else
m_query.Append(valueName)
End If
Dim funclet As Func(Of String) = Nothing
Select Case memberType
Case MemberTypes.Field
Dim f As FieldInfo = member
Dim c As ConstantExpression = e
If m.Type Is GetType(Date) Then
funclet = Function() CDate(f.GetValue(c.Value)).ToString("yyyy-MM-dd")
Else
funclet = Function() CStr(f.GetValue(c.Value))
End If
Case Else
Throw New NotSupportedException("Accessing member of type'" & memberType & "' is not supported.")
End Select
m_funclets.Add(New KeyValuePair(Of String, Func(Of String))(valueName, funclet))
So what’s all this ‘funclet’ nonsense? Well, the Linq architecture revolves around the concept of delayed execution. In other words, I create the query at one point, but I don’t actually evaluate it (capture input values and query underlying data source) until I start to use the query results. Because of this, we want to capture the information about how to access the contents of cutoffDate, but we don’t want to store the value away just yet. What I’m doing is placing a token ([value*]) in the query string and then creating a function that I can use to get the value of cutoffDate when the results of the query are accessed. I create the function using a lambda expression. This is basically a convenient way to create an inline, anonymous delegate in my code. It also has the benefit of automatically creating a closure class to store all of the information about the variables I access in the current block. For example, when I enter the ‘Case’ block, a new closure class will be generated, and the values for ‘f’ and ‘c’ will be stored in it. The compiler automatically translates these local variable accesses into field accesses on the appropriate members of the closure class. When the query is executed, and I execute the funclets to replace the [value*] tokens, I will get the value of the variables at that point in program’s execution (rather than at the point when the query is created). You’ll notice that the MemberAccessExpression for cutoffDate also represents a lifted local variable. This is why the member type is ‘Field’. Since cutoffDate is being used in a query, the value is actually being stored in a field on a closure class.
The next expression that we’ll end up exanding is the one representing ‘file.Name Like "%.exe"’. You might be surprised to find out that we process this node in ExpandCall rather than ExpandBinary. As it turns out, the VB compiler translates several of the common binary operators into calls to VB runtime functions. This allows VB to add extra functionality that is not supported by the CLR. LikeString (generated for the VB ‘Like’ operator) and CompareString (generated for string comparison expressions like “a” = “A”) are examples of this behavior. Here’s my implementation of ExpandCall that takes LikeString into account:
Private Sub ExpandCall(ByVal m As MethodCallExpression, Optional ByVal op As String = "")
Dim methodName = m.Method.Name
Select Case methodName
Case "LikeString"
ConcatBinary(m.Arguments(0), m.Arguments(1), "LIKE")
Case Else
Throw New NotSupportedException("Using method '" & methodName & "' in a filter expression is not supported.")
End Select
End Sub
The last thing we need to to do in order to wrap up processing of the “Where” is process the constant string value "%.exe". This translation straightforward, and the only thing worth paying attention to is that the default conversion for some data types may not work for your data source. In this case, WDS requires dates to be in a specific format.
Private Sub ExpandConstant(ByVal c As ConstantExpression)
Dim value = c.Value
If value.GetType() Is GetType(String) Then
m_query.Append("'")
m_query.Append(CStr(value))
m_query.Append("'")
ElseIf value.GetType() Is GetType(Date) Then
m_query.Append("'")
m_query.Append(CDate(value).ToString("yyyy-MM-dd"))
m_query.Append("'")
Else
m_query.Append(value.ToString())
End If
End Sub
Here ends the processing of the “Where” and the conclusion of the CreateQuery method. We have build up the following query to pass along to WDS:
"SELECT System.ItemPathDisplay FROM SystemIndex WHERE NOT CONTAINS(System.ItemType, 'folder') AND (System.DateCreated > '[value0]' AND System.FileName LIKE '%.exe')"
In my next post, I’ll finish up by covering GetEnumerator and Select.
Resources
Full source code for this project:
https://hresult.members.winisp.net/FileSystemQuery.zip
Bart De Smet’s excellent blog on Implementing IQueryable for Linq to LDAP:
Fabrice Marguerie’s blog in implementing Linq to Amazon:
https://weblogs.asp.net/fmarguerie/archive/2006/06/26/Introducing-Linq-to-Amazon.aspx
Catherine Heller’s blog on Windows Desktop (Vista) Search:
https://blogs.msdn.com/cheller/archive/2006/06/21/642220.aspx
List of query attributes supported by the Windows filesystem