What's the Difference, Part Four: into vs into
The keyword "into" in a query comprehension means two different things, depending on whether it follows a join or select/group. If it follows a join, it turns a join into a group join. If it follows a select or group then it introduces a query continuation. These two features are quite different, but easily confused.
First, the group join. Suppose you've got a key -- a customer id number -- that is used as the primary key of a collection of customers, and as a foreign key of a collection of credit card numbers. That is, you have a class Customer with fields Id, Name, Address, and so on, and a class CreditCard with fields CustomerId, CardType, Number, and so on. Suppose customer Bob has a Visa and a Discover, and customer Alice has a Visa and a Mastercard. So we have the customer data:
101, Bob
102, Alice
and the credit card data:
101, Visa
101, Discover
102, Visa
102, MasterCard
If we create a query
from customer in customers
join card in cards on customer.Id equals card.CustomerId
select new {customer.Name, card.Kind}
The results of the query would be
Bob, Visa
Bob, Discover
Alice, Visa
Alice, Mastercard
right? This is just a straightforward join. We end up with a list with four items on it. But this is probably not what you actually want in this case. Suppose you wanted a list of customers, and with each customer on the list, a list of their credit cards. You can use a group join:
from customer in customers
join card in cards on customer.Id equals card.CustomerId into cardList
select new {customer.Name, Cards = cardList}
The results of this query would be two records, not four:
Bob, { Visa, Discover }
Alice, { Visa, Mastercard }
Basically, the "into" in a group join is the portion of the query expression which logically gathers up the results of all the joined records, makes them into a sequence, and stuffs them into temporary variable cardList.
A query continuation means something rather different. The point of a query continuation is to make it easy to "pipe" the results of one query into the next. For example, suppose you want to find all the brown-eyed children of who have at least one blue-eyed sibling. The obvious way to do that would be something like
from parent in parents
from child in parent.Children
where child.EyeColor == "Brown"
where parent.Children.Any(c=>c.EyeColor == Blue)
select child
but let's suppose that we don't want to do that. Suppose there are a lot of large families of brown-eyed children with no blue-eyed siblings; a naive search could be quite inefficient. You might think to narrow it down the other way first. That is, it might be faster to find all the parents who have a blue-eyed child, and then extract from that small set of parents all the brown-eyed children. That's easiest to do in two queries. First find the parents, then from that build a second query which projects out the children:
var parentsWithABlueEyedChild =
from parent in parents
where parent.Children.Any(c=>c.EyeColor == Blue)
select parent;
var brownEyedChildren =
from p in parentsWithABlueEyedChild
from child in p.Children
where child.EyeColor == Brown
select child;
Now, we could combine this into one big query easily enough:
var brownEyedChildren =
from p in (
from parent in parents
where parent.Children.Any(c=>c.EyeColor == Blue)
select parent)
from child in p.Children
where child.EyeColor == Brown
select child;
But... yuck. Imagine this nesting a few deeper. It makes a total mess. Notice how we introduce range variable "p" first, and then we have to get through a whole other query before it becomes relevant again. We're introducing range variables in the "inside out" order here. A query continuation simply allows you to put this order "right-side in" again, by moving the "p" range variable to after the initial query:
var brownEyedChildren =
from parent in parents
where parent.Children.Any(c=>c.EyeColor == Blue)
select parent into p
from child in p.Children
where child.EyeColor == Brown
select child;
Notice that in the group join case, you can think of the "into" identifier as being something logically representing the sequence that results from the group of matching joined items. But in the query continuation case, the "into" identifier does not represent the sequence from the first query -- the query itself is an object which represents the first sequence! Rather, "p" represents a range variable that picks one member at a time out of the sequence. Remember, the "into" in a query continuation is just a fancy way of saying from p in (blah); the "p" is a range variable that ranges over the sequence of elements of (blah), but it is not the sequence of elements of (blah) itself.
Comments
Anonymous
August 31, 2009
Wow, I have never seen the second usage of into before. When I think of all the times that I broke a query up into multiple parts or nested it to achieve a similar effect...well, as you said, yuck.Anonymous
August 31, 2009
I think the keyword "into" does more or less the same thing in both cases: it creates a certain sub-set of the original set. The difference is in where this sub-set is situated (so to speak) and how it's used afterwards: in the first case, it's a part of a record of the resulting set; in the second case, it's used as an interim to narrow the original set down. But in both cases, "into" means, "sub-set". I think the earlier example of "fixed" as in "fixed in memory" vs. "fixed" as in "fixed in size" is more vivid: the same keyword is used to say that something cannot be moved, or that something cannot grow or shrink, depending on the context. With that in mind, the idea of using, say, "pinned" for the former, while sticking to "fixed" for the latter seems very plausible to me; but can anyone come up with an idea of a synonimic keyword that would express the difference between "into" vs. "into?"Anonymous
August 31, 2009
I came up with 'as' in stead of 'into' in the last case, as in "select parent as p" ... but since "parent as SomeClass" is already an existing expression, that would not be a good idea.Anonymous
August 31, 2009
The comment has been removedAnonymous
August 31, 2009
@Blake, Thanks, "name binding" certainly is a better term for it, but still - it's name binding, in both cases, so once again: the meaning is more or less the same. And, speaking of the name binding, an alternative keyword for the latter (second) "into" could be "called". How's this: var brownEyedChildren = from parent in parents where parent.Children.Any(c=>c.EyeColor == Blue) select parent called p from child in p.Children where child.EyeColor == Brown select child;Anonymous
September 01, 2009
Two major problems with "called":
- It's the wrong part of speech. From, in, for, to, where, into, etc. are all prepositions. "Called" is a past-tense verb and implies an action; there should only be one action in a query or continuation ("select").
- It incorrectly describes what's really happening. As Eric explains, it creates a range variable, not an alias. It actually is putting something "into" that variable; it's not merely taking the "something" and giving it a new name. I found the continuation form of "into" intuitive enough, and have used it several times. It bears at least a superficial resemblance to T-SQL INTO which dumps the results into a new table, although it's obviously not quite the same (the Linq version isn't necessarily storing the temporary results anywhere). Still, it makes sense as a metaphor and is pretty hard to confuse with the group join form.
Anonymous
September 01, 2009
My own question for Eric above was bothering me, so I went to the C# Language Spec and ran through the query re-writing rules manually till I sorted it out. The difference is subtle but important: from x in e ... select v into y ... from x in e ... let y = v ... Both leave you with the same range variable y, but the select form hides the range variable x from any following query expressions whereas the let form propagates it along.Anonymous
September 01, 2009
@Aaron G, Just for the record, this comes from a Wikipedia article on participles: As noun-modifiers, participles usually precede the noun (like adjectives), but in many cases they can or must follow it: The visiting dignitaries devoured the BAKED apples. Please bring all the documents REQUIRED. The difficulties ENCOUNTERED were nearly insurmountable. That's how I meant to use the word "called". But Ok, I'm convinced: "into" is better. You can't have a query loaded with participles, relative clauses, and all the rest of scary grammar :-)