So what's the deal with this whole C# 3.0 / Linq thingy? (Part 2)

In the last post i  discussed a little bit of background on why we wanted to introduce Linq, as well as a bit of info on what some basic C# Linq looked like.  In this post i'm going to dive in a little bit deeper to some other interesting things we're introducing as well

Here's the current example we've been using to drive the discussion along

         Customer[] customers = GetCustomers();
        var custs = customers.Where(c => c.City == "Seattle").Select(c => c.Name);

Now, so far that's a very C#-centric way to do queries over data.  However, it's still a little bit heavyweight.  What about a more query-like syntax to do the same that's far more convenient?  Well, it turns out htat we have that as well:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select c.Name;

This new query syntax is in fact just syntactic sugar that uses patterns to transform itself into the *exact* same C# query that i listed above.  In fact, this is the same way that we handle foreach (specifically by transforming it into a loop with calls to MoveNext, Current, Dispose).

Now, when looking at this you'll almost certainly notice how it looks *almost*, but not quite like SQL.  And, you'll probably be asking: "can't you just make it look like SQL if it's that close!  Is this just MS wanting to be a pain just for the heck of it??"  In this case, the answer is "No".  One of hte problems with the straight SQL like approach is that we'd have to put the "select" first.  "Ok... what's wrong with that" you say.   Well, let's take a look:

    var q = 
      select c<dot>

Now, at this point, you're constructing the final shape for this query.  You know you want to write "c.Name" and you'd like to utilize handy features like IntelliSense to help speed you up with typing that.  But you can't!  Because you haven't even stated where your data is coming from, there's no way to understand what's going on this early in the expression.  This is because in SQL the scope of a variable actually flows backwards.  i.e. you use variables before you've even declared this.  However, in C# you can only use something after it's been declared.  So in order to better fit within this model (which has some very nice benefits), we made it so that from has to come first.  Beyond statement completion there are also issues of being able to construct large hierarchical queries in an understandable way.  Having the scope flow from left to right, top to bottom, makes that much simpler and brings a lot of clarity to your expressions.

Now what about projections?  They're incredibly common operations in SQL.  You're aways doing things like "select a, b, c" and in essence projection out the information you care about into these columns.  So how would we go about doing this sort of thing in C# 3.0?  Well, you could do this:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new NameAndAge(c.Name, c.Age);

but that's a real pain.  Any time i want to project any information out, i need to generate a new type and fill all it's gunk in.  That means writing the class somewhere.  Creating a constructor for it.  Creating fields and properties.  Implementing .Equals and .GetHashCode.  etc. etc.  yech.  Far too much work, error prone and causes API clutter.  So what can we do to alleviate that?  Well, in C# 3.0 a new feature called "Anonymous Types" comes to the rescue.  We can now write the following:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new { c.Name, c.Age };

What this is doing is projecting the customer out into a new structural type with two properties "Name" and "Age", both of which are strongly typed and which have been assigned the values of their corresponding properties in "c".  What's the type of Q at this point?  Well, it's an IEnumerable<???> where ??? is some anonymous type with those two properties on it.  BTW, it should now seem somewhat more obvious why the "var" keyword was added to the language.  In this case you cannot actually write down the type of "q", but you need some way to declare it.  "var" comes to the rescue here.

So i could now write:

    foreach (var c in q) {
      Console.WriteLine(c.Name);
   }

and that would compile and run just file.

Now "wait a minute!" you're saying.  "Is this some sort of late-binding thang where we're using refelction to pull out this data?"  No sir-ee.  In fact, if you were to try and write:

    foreach (var c in q) {
      Console.WriteLine(c.Company);
   }

then you would get a compiler error immediate.  Why?  Well, the compiler knows that the anonymous type which you've instantiated only has two members on it (Name and Age), and it's able to flow that information into the type signature of 'q'.  Then when foreach'ing over 'q', it knows that the type of 'c' is the same structural anonymous type we created earlier in the 'select'.  So it will know that it has no "Company" property and appropriately inform you that your code is bogus.  All the strong, static typeing of C# is there.  You are just allowed to exchew writing the type now and instead allow inference to to take care of all of it for you.  Users of languages like OCaml will find this immeditely familiar and comfortable.

Now, one thing that's quite common in the object world is the usage and manipulation of hierarchical data.  i.e. objects formed by collection of other objects formed by collections of... you get the idea.  Now, say you wanted to query your customers to get not only the customer name, but information about the orders they've been creating.  You could write the following very SQL-esque query:

    var q = 
      from c in customers
      where c.City == "Seattle"
      from o in c.Orders 
      where o.Cost > 1000
      select new { c.Name, o.Cost, o.Date };

We've now joined the customer with their own orders.  This would get the job done, but maybe it's not really returning the information in the structure you want.  For one thing, the data isn't grouped by customer.  So for every order made by the same customer you're going to get a new element.  So let's take it a little further:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

Voila.  We've now created a hierarchical result.  Now, per customer you'll only get one item returned.  And that item will have information about all the different orders they've made that fit your criteria.  Now you can trivially create queries that get you the results you want in the exact shape you want.

Next up!  Drill downs into many of the specific new features that we're bringing to the table.

But first: a teaser!  Say you have the following code:

    var customers = GetCustomersFromDataBase();
   var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

   foreach (var c in q) {
       //Do something with c
   }

Did you know that you will be able to write that code in C# 3.0 and DLinq will make sure that that query executes on the DB using SQL?  It will then only suck down the results that matched the query, and only when you foreach over them.   That's right.  That entire "from ... " expression will execure server side.  And it didn't need to even be in "from" form.  If you'd written it as "customers.Where(c => c.City == "Seattle").Select(c => c.Name)" then the same  would be true.  How's that for cool.  Stay tuned and a later post will tell you how that all works!

Comments

  • Anonymous
    September 16, 2005
    All nice and dandy, but you most of the time will have to define types anyway, for the simple reason that the place where you READ data isn't always (most of the time) the place where you consume data. This means that where you consume data you call a method which retrieves the data for you (example: gui calls BL method).

    Though in the BL method, you can use 'var' all you want, but you can't return it from the method, the method has to have a strong type. Which means that I have to define the type anyway.

    I'm developing O/R mappers for more than 3 years now and I know the necessity of 'var' seems to come from the fact that a dynamic list has to be stored in a statically defined type, and 'var' 'solves' that'. But only partly as I described above.

    I more and more get the feeling this 'var' construct is only useful when you directly read the data from the DB in your GUI tier for example. I don't know if Microsoft actually investigated use-cases for this, but I know from my long experience with O/R mapper code and the people who use these things, they absolutely want to separate the data usage tier from the data producing tier, i.e.: they want to avoid at all costs that a GUI developer is able to make shortcuts and read/write to the db directly, thus bypassing BL code.

    We can discuss this deeper at the summit in a few weeks but I'd like to point out that it looks great at first, this 'var', but the more I think about it, it starts to look like a nice thing to demo and to use in small petprojects but is completely useless in multi-tier applications.

  • Anonymous
    September 17, 2005
    I absolutely agree about var not being nearly as useful as it can be.

    Beyond that, about DLinq, are the queries being remoted & executed there, or is an SQL statement is prepared from the expression tree and sent to the database?

    My understanding was that it is the later case, not the first one, but you post seems to say otherwise.

  • Anonymous
    September 17, 2005
    In your last two code samples, I take it there is a comma missing between c.Name and Orders?

  • Anonymous
    September 17, 2005
    The comment has been removed

  • Anonymous
    September 17, 2005
    The comment has been removed

  • Anonymous
    September 17, 2005
    The comment has been removed

  • Anonymous
    September 17, 2005
    It really looks complicated to me. Given a choice, I would prefer not to learn a third language to integrate two languages. SQL can really be complex. Also, how do we represent OUTER JOINS or CROSSJOINS in LINQ?

    Looking at this, I will prefer the traditional SQL in Stored Procedures and access it via ADO.NET. Much easier to work on. Also, I think OR Mappers are easier to understand and more usable. Not sure about others.

  • Anonymous
    September 18, 2005
    Heres a question about DLinq:

    given a query, such as:

    var q =
    from c in customers
    where c.City == "Seattle"
    select new NameAndAge(c.Name, c.Age);

    Now, normally this is converted into an expression using extension methods from the System.Query namespace.

    What would happen if a completely different implementation of those extension methods was imported instead of System.Query. Is DLinq pluggable to that extent?

  • Anonymous
    September 18, 2005
    The comment has been removed

  • Anonymous
    September 19, 2005
    The comment has been removed

  • Anonymous
    September 19, 2005
    "This new query syntax is in fact just syntactic sugar that uses patterns to transform itself into the exact same C# query that i listed above."

    Will that pattern-transforming capability be exposed to the programmer, or is that an implementation detail? If you are here revealing that, in C#, metaprogramming will finally be possible, this particular SQL example is only a tiny glimpse of the many new possibilities.

  • Anonymous
    September 19, 2005
    Cyrus, this is very cool stuff for sure! Honestly I share the same concern as Frans Bouma who has a really great O/R mapper product (LLBLGen Pro) that I've been using over the last several months for a client's project. He has nailed an important issue: that most of the time there is either a logical or physical boundary between the data reader and data consumer. For example, imagine a common assembly to hold classes like Customer, Order and the UI calls a BL layer which both reference common.dll to use those entities. LLBLGen Pro uses an adapter to do this. I would absolutely love to see DLINQ provide this kind of model. Of course there is a market for the "unlayered" model that's being demo'd today just like there is a market for people using the SqlDataSource control in ASP.NET with literal T-SQL values embedded right in the page...but that kind of approach is almost useless to me in most cases. Probably a better value for mort. :-)

    Great post!

  • Anonymous
    September 19, 2005
    The comment has been removed

  • Anonymous
    September 19, 2005
    kfarmer: I take your point, but this is an example of a philosophical mismatch between a LINQ query and an SQL query.

    As a C# developer, I am comfortable with explicitly declaring all the properties of interest, however, SQL allows for those properties to be dealt with in bulk. Tutorial-D (Date's alternative to SQL) allows more sophisticated operations on those bulk properties, such as S{ALL
    BUT CITY, NAME}.

    Operations for dealing with properties in bulk constitute a kind of dictionary algebra, with support for renaming, merging, and so forth. A general solution in this domain is messy, however you cut it.

    Im not sure this is a problem, but if it is, its a messy one.

  • Anonymous
    September 19, 2005
    Jared:

    "Will that pattern-transforming capability be exposed to the programmer...?"

    Yes it will - if you assign a lambda expression to a variable of type Expression<T> you can then manipulate the expression tree to your hearts content.

    Not sure how select expressions are handled.

  • Anonymous
    September 19, 2005
    In your code snippet:

    var q =
    from c in customers
    where c.City == "Seattle"
    select new {
    c.Name,
    Orders =
    from o in c.Orders
    where o.Cost > 1000
    select new { o.Cost, o.Date }
    };

    Why is there no 'var ' needed before 'Orders = ' ?

  • Anonymous
    September 20, 2005
    duncan: Because it is inferred from the "select new { o.Cost, o.Date }" portion of the 'subselect'.

    That is my understanding of it.

  • Anonymous
    September 20, 2005
    damien:

    I think that's because the philosophy of LINQ isn't about SQL databases: those are a side-attraction.

    I think Microsoft made a mistake in focussing so much on making the SQL folks comfortable that they've now got a very hard time ahead in explaining the differences between the two. Part of that mistake was in publishing C-omega: everyone fell in love with it, even though it had some serious troubles in dealing with embedded SQL. Now everybody wants C-omega, despite the lower degree of applicability.

    I don't have my LINQ machine in front of me, but have you tried just selecting the item? Something like:

    from c in Customers
    where c.name = 'Bob'
    select new { Customer = c, IsBob = true }

  • Anonymous
    September 20, 2005
    That query does work, I just tested it, but wont work in a regular database.

    I think that youre right about LINQ not being about SQL databases.

    A great part of our jobs as programmers is writing queries over our data structures, and the LINQ framework does a lot of that work for us.

  • Anonymous
    September 20, 2005
    Oh cool!

    You can define your own relational operators, such as Where() specialised on types derived from IEnumerable<T>.

    This is a somewhat inane example, but compare Where() defined for IEnumerable<T> and an equivalent Where() defined for IList<T>. All you need to do is ensure your new extension method is available and type-resolution will select the most specific overload.

    public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int count)
    {
    if (count > 0) {
    foreach (T element in source) {
    yield return element;
    if (--count == 0) break;
    }
    }
    }

    public static IEnumerable<T> Take<T>(this List<T> source, int count) {
    for (int i = 0; i < count; i++)
    yield return source[i];
    }

  • Anonymous
    September 20, 2005
    The comment has been removed

  • Anonymous
    September 20, 2005
    Firedancer: "It really looks complicated to me. Given a choice, I would prefer not to learn a third language to integrate two languages. SQL can really be complex. Also, how do we represent OUTER JOINS or CROSSJOINS in LINQ? "

    That's why we gave you a choice. You don't have to use this if you don't want to.

    "Looking at this, I will prefer the traditional SQL in Stored Procedures and access it via ADO.NET. Much easier to work on. Also, I think OR Mappers are easier to understand and more usable. Not sure about others. "

    Again, that's why there's a choice.

  • Anonymous
    September 20, 2005
    Damien: "Heres a question about DLinq:

    given a query, such as:

    var q =
    from c in customers
    where c.City == "Seattle"
    select new NameAndAge(c.Name, c.Age);

    Now, normally this is converted into an expression using extension methods from the System.Query namespace.

    What would happen if a completely different implementation of those extension methods was imported instead of System.Query. Is DLinq pluggable to that extent? "

    The "from" syntax has nothing to do with System.Sequence. It is just syntactic sugar that converts it to a pattern. i.e. .Where, .Select.

    So if you're using some other types that define their own Where/Select methods, then you'll be fine. the "from" comprehension will just end up binding to those.

  • Anonymous
    September 20, 2005
    Damien: "a question on anonymous types...

    I kinda like this new notation, but only as a way of bringing together various query results. Clearly, these anonymous types arent usefull beyond the scope in which they are created, except as a data-holding class that can be reflected over.

    It would be usefull to be able to constrain these anonymous types to be immutable and/or structs.
    "

    Right now we're just showing a preview of what we're working on. there are currently limitations in place in that preview (like not being able to pass "Var" out of a method. We're actively investigating different approaches to this problem (such as being able to define anonymous types easily so that you can pass them around.

  • Anonymous
    September 22, 2005
    Is it just me, or would the XQuery-like syntax

    var q = for c in customers
    ....... where c.City == "Seattle"
    ....... select c.Name;

    not be more C#-ish? I could even go for

    var q = for c in customers
    ....... where c.City == "Seattle"
    ....... return c.Name;

    But that would probably be a little confusing. Replacing from with for saves you a keyword, and could silence all the complaints about wanting SELECT FROM instead of FROM SELECT.

  • Anonymous
    September 24, 2005
    Ruben:

    "Is it just me, or would the XQuery-like syntax

    var q = for c in customers
    ....... where c.City == "Seattle"
    ....... select c.Name;

    not be more C#-ish? I could even go for"

    That would be fine.

    "var q = for c in customers
    ....... where c.City == "Seattle"
    ....... return c.Name;"

    I like that less. The return would be extrmely confusing.

  • Anonymous
    September 26, 2005
    How about

    "for c in customers
    where c.City == "Seattle"
    yield c.Name;"

    Yield fits in with the generator idea that this is making use of.

    One might also use foreach instead of for, if an unambiguous syntax can be had in that case.

  • Anonymous
    October 22, 2005
    The scope of 'var' obviously needs to be greater than just within a method.

    Some queries will get to be very long. How can you refactor this if your scope is limited?

  • Anonymous
    August 03, 2008
    PingBack from http://yasmin.getyourfreefitnessvideo.info/linqselectcolumnasliteral.html

  • Anonymous
    June 08, 2009
    PingBack from http://cellulitecreamsite.info/story.php?id=771

  • Anonymous
    June 18, 2009
    PingBack from http://thestoragebench.info/story.php?id=2487