Query transformations are syntactic

As you probably know, there are two ways to write a LINQ query in C#. The way I personally prefer is to use the “query comprehension” syntax:

from customer in customerList
where customer.City == "London"
select customer.Name

Or you can, equivalently, use the “fluent method call” syntax:

customerList
.Where(customer=>customer.City == "London")
.Select(customer=>customer.Name)

These are guaranteed to be equivalent because the compiler simply transforms the former syntax into the latter syntax before it compiles it. An interesting aspect of this transformation is that it is (almost) entirely syntactic. (The "transparent identifiers" generated for certain queries require a small amount of semantic analysis to determine the corresponding anonymous type, but for the most part, all we do is just pull the raw hunks of code out of each clause and reorganize the program into the method call form.) Once it is in a form that the rest of the compiler can understand, then semantic analysis proceeds as usual.

This means that it is perfectly legal to do dumb things. For example, suppose we decide that by “where” we mean don’t mean “filter”, we actually mean “multiply”. And by “select” we actually mean “add”, not “project”. No problem!

static class ExtInt
{
public static int Where(this int c, Func<int, int> f)
{
return f(0) * c;
}
public static int Select(this int c, Func<int, int> f)
{
return f(0) + c;
}
}

int ten = 10;
int twenty = 20;
int thirty = 30;
int result = from c in ten where twenty select thirty;
Console.WriteLine(result);

And sure enough, ten where/times twenty select/plus thirty is 230. This is a deeply strange way to write a mathematical expression, but legal.

The semantics of the Where method are supposed to be “takes a collection of T and a predicate mapping T to bool, and returns a filtered collection of those T items which match the predicate”. This method does not take a collection or a predicate, it does not return a collection, and it certainly does not have the semantics of filtering, but the compiler neither knows nor cares about any of those facts. All the compiler does is syntactically turn that line into

int result = ten.Where(c=>twenty).Select(c=>thirty);

and compile it; that line compiles just fine, so, no problem as far as the compiler is concerned.

The C# specification has a section which describes the pattern we expect a query provider to implement: that Where takes a predicate, and so on. But we perform no checks whatsoever that you successfully implemented a query provider that implements either the form or the semantics of our recommended pattern. If you do something crazy, like redefine “where” to take something other than a predicate and you get crazy results, then, well, what can I tell you? If it hurts when you do that then don’t do that!

Comments

  • Anonymous
    December 07, 2009
    Although it's certainly not advisable to redefine query operators to perform mathematical operations, there could be some useful applications... for instance, check out this post by Marc Gravell : http://marcgravell.blogspot.com/2009/11/selectmany-combining-idisposable-and.html

  • Anonymous
    December 07, 2009
    "This means that it is perfectly legal to do dumb things." Since when has it been illegal to do dumb things as a developer?

  • Anonymous
    December 07, 2009
    Bart de Smet also explores such things in "Who ever said LINQ predicates need to be Boolean-valued?" (http://bartdesmet.net/blogs/bart/archive/2008/09/14/who-ever-said-linq-predicates-need-to-be-boolean-valued.aspx)

  • Anonymous
    December 07, 2009
    Interesting timing--I just had to implement this functionality in a compiler for a side project.  Getting the transparent identifiers and scoping right was tricky, and one of the spots where the language spec could have been a bit more thorough.  Thankfully, Wes Dyer had a great blog post on this that helped get me on the right track. On the whole, though, you guys have done a great job with the language spec.  It's been immensely helpful in helping me get type inference and implicit lambdas implemented correctly.  I really appreciate the effort.

  • Anonymous
    December 07, 2009
    I wish this will be not a dark magic (something that is hard coded in the compiler), but just one usage of public extension point. I wish there will be a public way to define rules to map custom contextual keywords to some set of methods. And LINQ will be just one set of such rules. And I wish I were able to define my own mapping for my methods.

  • Anonymous
    December 07, 2009
    Andrey, try Googling (or Binging) for a .NET language called "Nemerle".  It implements macros in the way you're describing, and in fact that is how they implemented LINQ support.  The language supports a syntax similar to C# as well as a "shorthand" syntax similar to functional languages.  It's pretty slick, and while you may not be able to use it for your day job, you could use it for personal projects.

  • Anonymous
    December 07, 2009
    <blockquote>I wish this will be not a dark magic (something that is hard coded in the compiler), but just one usage of public extension point. I wish there will be a public way to define rules to map custom contextual keywords to some set of methods. And LINQ will be just one set of such rules. And I wish I were able to define my own mapping for my methods.</blockquote> There is such a thing: templates in C++. (ducks!)

  • Anonymous
    December 07, 2009
    > The way I personally prefer is to use the “query comprehension” syntax Eric, have you written anything (here or somewhere like stackoverflow) talking about why this is your preference? Personally I'm still struggling to internalise an intuition about which way is better when...

  • Anonymous
    December 07, 2009
    This ( http://blogs.msdn.com/mattwar/archive/2008/07/14/linq-building-an-iqueryable-provider-part-xi.aspx ) is the 11th part of the "LINQ: Building an IQueryable Provider" series by Matt Warren ( http://blogs.msdn.com/user/Profile.aspx?UserID=3134 ). I have only used the early stages of it, a year ago, to implement a very cut-down LINQ provider for the SOAP web services-based API of the CA's Unicenter Service Desk, but even that limited experience of mine is enough to show me that the expression trees LINQ is based upon can ultimately only be converted into the the "fluent method call" syntax, with the compiled lambda expression becoming the entry point of the whole thing. And on top of that, the "query comprehension" syntax is implemented via extension methods, to make it all look and feel more like the "traditional" SQL, to standardize the syntax... not so?

  • Anonymous
    December 07, 2009
    The comment has been removed

  • Anonymous
    December 07, 2009
    The queries in the post are relatively conventional - after all, they're still calling extension methods. The compiler isn't limited to those... we can use properties or fields, and even call static query operators on types... http://msmvps.com/blogs/jon_skeet/archive/2008/02/29/odd-query-expressions.aspx Evil, evil stuff... but fun.

  • Anonymous
    December 07, 2009
    I had been led to believe that e.g. the Where method for some classes will translate an 'expression tree' to some query language specific to the object (e.g. sql for linq to sql, DataTable.Select(String) for linq to datasets). How does it do that if it has already been syntactically transformed by the compiler to a delegate?

  • Anonymous
    December 08, 2009
    @Random832: It's not syntactically transformed into a delegate. It's transformed into a lambda expression. That lambda expression may then be converted into a delegate - or it may be converted into an expression tree - it depends on what the method parameter type is. That's another way in which the transformation is incredibly neat :)

  • Anonymous
    December 08, 2009
    Denis, are you saying that, unless you have a really big time budget, converting LINQ expressions to anything else than compiled code is not feasable? If you use LINQ to connect so some SOAP API, the problem is usually that LINQ is so much more powerful than that API, so you would usually end up just taking a few things out of the primary "where" query that you put into the API, and perform the rest of the LINQ statement in memory. There's really not much more you can do, and that's not really LINQ's fault, is it? But on the other hend, if you're targeting some system with a more powerful query language, of course you'll notice that LINQ is not really helping a lot here, because it represents the transformed query. And that query has suffered really bad transformations in order to be executable in a static language like C#. E.g., transparent identifiers work around the fact that a delegate cannot take an arbitrary number of input arguments etc. If that's what you're concerned with, take a look at re-linq @ http://relinq.codeplex.com. It will give you a nice AST that resembles the LINQ query expressions (from...where...select etc) with transparent identifiers removed. Much easier to go from there! @Eric: "The "transparent identifiers" generated for certain queries require a small amount of semantic analysis to determine the corresponding anonymous type..." Oh well, that's one way to have it. Guess how much fun it is to trace those transparent identifiers back to the original identifiers they're hiding! Because that's really what you want if your LINQ provider targets anything other than C#...

  • Anonymous
    December 10, 2009
    The comment has been removed

  • Anonymous
    December 10, 2009
    @Mike: thanks, I just pointed to that (it's set up on codeplex now). Actually, the guys who wrote it work just across the hall from me. Good to hear it's catching on!