Jaa


The LINQ Farm Part II: Query Expressions

In this post I'm going to explain the query expression found in my previous article on LINQ. This series of posts is designed to get first time users of LINQ up and running.  Subsequent entries will continue to explore query expressions and other basic building blocks useful to LINQ developers.

Before getting started, it might be useful to review the primary goals that motivate the LINQ technology. In the first post, I said that LINQ allows developers to write SQL, XML and other queries in fully type checked C# code. LINQ circumvents the need to call XPath or to embed SQL strings in your code. Querying is now a first class citizen of the C# language, and this means that the code you write is fully type checked, and follows strict rules that can be confirmed at compile time. You don't have to wait for run time to see if your code is fundamentally sound. Not all problems can be resolved at compile time, but LINQ greatly increases the number of problems which the compiler can resolve for you before your program is executed.

While reading this post, it is important to remember that LINQ is still under development. The code I'm showing here is not final, nor has the team finalized many of the terms used to describe the LINQ syntax. Nevertheless, a LINQ CTP is available, and more preview code is in the pipe. We want developers to begin using these bits so they can provide feedback to the C# team, and so they can understand the huge innovation that LINQ provides for C# developers.

Query Expressions

To get started, let's take a the query expression described in the previous post:

    1:  string[] EinsteinQuote = new string[] {"space", 
    2:    "detached", "from", "any", "physical", 
    3:    "content", "does", "not", "exist" }; 
    4:   
    5:  IEnumerable<string> selectedWords =
    6:    from p in EinsteinQuote
    7:    where p.Equals("any") != true
    8:    select p;

For context, I've shown the string array consumed by our LINQ code, but the query expression itself is found in lines 5 through 8.

The from clause on line 6 introduces a variable p. Variables like p in a query expression are usually referred to as range variables. In general, the from, let, join or into keywords allow developers to introduce a range variable.

The variable p is never formally declared in the sense that we traditionally declare a variable by explicitly stating its type:

string p;

Instead, p is introduced by the from clause.

The type of p is determined by the context in which it appears. In this case we know p is a string since it is an element in the array of string called EinsteinQuote. It is not correct to think of p as not being declared at all. The compiler knows the type of p, but rather than being explicitly, the declaration is inferred from the context in which it is used.

The variable p goes out of scope when the query expression ends. Query expressions always end with a select or groupby clause.

Query Rewrites

The developers of LINQ had two problems to solve:

  1. They had to find a way to make queries a first class citizen of the C# language.
  2. They had to find an easy, intuitive way to present this technology to the user.

The solution the team created has two layers:

  • A low level set of method calls that uses generics, expression trees, extensions methods and lambda expressions. For reasons that I will make clear over time, these method calls are often referred to as a chain of query operators.
  • On top of these method calls the team developed a query expression syntax that makes it easy for developers to express their questions in a simple, intuitive syntax.

Query expressions are easy to understand. The underlying method calls can, in some circumstance, be quite complex. Query Expressions put a friendly face on a technology that can be difficult to understand.

In this particular case, the chain of query operators for our query is quite simple:

selectedWords = EinsteinQuote.Where(p => p.Equals(“any”) != true)

Here you can see that there is an extension method called Where() on our source collection. This extension method takes a lambda expression as a parameter. For now, let's not go any further than this. You don't need to understand extension methods or lambda expressions in order to use LINQ. I'm just showing you this chain of query operators so you can get a feeling for what is going on behind the scenes when you write a query expression. I will revisit this subject in a future post, but for now, let's just set all this aside and focus on what we need to know in order to be productive.

Query Expressions support IEnumerable and IQueryable

Query expressions only work with source variables that support either the IEnumerable or IQueryable interface. If the input to a query expression does not support one of these interfaces, then your code won't compile.

Consider the code shown on line 6: from p in EinsteinQuote. Many query expressions take on this form: from p in X, where X is called the source collection or source variable. It is X that must support either IEnumerable or IQueryable.

In the example we've been working with, the array of string called EinsteinQuote supports IEnumerable. You don't have to do anything special to make EinsteinQuote support this interface. Support for IEnumerable is built into an array of string. If you declare an array of string, then it will support IEnumerable automatically, with no further work on your part.

Query expressions return either IQueryable or IEnumerable. IQueryable is returned when the source collection implements IQueryable, otherwise it returns IEnumerable. In our case, EinsteinQuote supports IEnumerable but not IQueryable, therefore the query expressions returns IEnumerable, as shown in line 5.

You can always tell whether a particular variable supports an interface by evaluating code like the following in a watch window (Ctrl + Alt + W + 1):

EinsteinQuote is IEnumerable<string>

In our case, the is operator will return true because an array of string supports IEnumerable. The following code will return false:

EinsteinQuote is IQueryable<string>

Summary

This post introduced the concept of a query expression. This text covered two problem domains:

  • The query expression syntax itself.
  • The underlying method calls that are created after a query expressions is translated. These methods calls are referred to as a chain of query operators.

For the next few posts in this series I plan to focus solely on query expressions. I don't believe it is necessary for many LINQ developers to understand lambda expressions and extension methods. I want to start talking about them later on, but for now, my theory is that they will represent a distraction to us until we get a little further along in our understanding of this technology.

Compiler developers and some C# gurus are naturally, and rightfully, going to be fascinated by all the complex plumbing that makes LINQ possible. Nevertheless, I want to lay out what is necessary to become productive with LINQ first, and then turn around and look at the deeper issues that make this technology possible. This is the way the team wants to present the material to our users, and the plan seems imminently sensible to me.

Now that we have an understanding of query expressions, the next post in this series will cover more of the details of how to write query expressions. The focus at first will be on querying the data in a typical C# program. Future posts will cover working with SQL and XML queries. Eventually we will circle back around and look at some of the underlying technology such as lambda expressions, extension methods, and so on.

kick it on DotNetKicks.com

digg this

Comments