Loving LINQ
I have to say, I love how easy it is to do SQL-like expressions over collections of objects with LINQ...
If we have two simple classes that define our data;
class Person
{
public int Id { get; set; }
public string Name { get; set; }
public int AddressId { get; set; }
}
class Address
{
public int Id { get; set; }
public string Line1 { get; set; }
public string PostCode { get; set; }
}
Then we can “join” them in memory with a quick LINQ query;
static void JoiningCollections()
{
// create some test data
List<Person> people = new List<Person>()
{
new Person() { Id=1, Name="Simon Ince", AddressId=1},
new Person() { Id=2, Name="Someone Else", AddressId=1},
new Person() { Id=3, Name="My Friend", AddressId=2},
new Person() { Id=4, Name="My Other Friend"}
};
// create some more test data
List<Address> addresses = new List<Address>()
{
new Address() { Id=1,
Line1="1 Aroad",
PostCode="A1 B23" },
new Address() { Id=2,
Line1="3 Another Road",
PostCode="Z1 ZZZ" }
};
// define a really simple query
var results =
from p in people
join a in addresses
on p.AddressId equals a.Id
select new { p.Name, a.Line1, a.PostCode };
// and enjoy the simplicity of the results
foreach (var result in results)
{
Console.WriteLine(String.Format("{0} lives at {1}, {2}.",
result.Name,
result.Line1,
result.PostCode));
}
}
Imagine how many more lines of code you would have to write to achieve that before LINQ existed.
The one thing I don’t like about this is that you may need to avoid the “var” keyword when returning result sets to a calling function. No problem... just add a class that can pair up your names and addresses;
class PersonsAddress
{
public Person Person { get; set; }
public Address Address { get; set; }
}
... and change your query and output routine to use it;
IEnumerable<PersonsAddress> results =
from p in people
join a in addresses
on p.AddressId equals a.Id
select new PersonsAddress { Person=p, Address=a };
foreach (PersonsAddress result in results)
{
Console.WriteLine(String.Format("{0} lives at {1}, {2}.",
result.Person.Name,
result.Address.Line1,
result.Address.PostCode));
}
Hey presto – you have just joined together two sets of objects in memory, and could now return them as a strongly typed collection of data. What is also interesting is that by using the IEnumerable<PersonsAddress> type this query won’t be executed until you start to iterate over the results.
The only gotcha with this is that I dread the day when I see two massive collections of data retrieved from a database and joined in memory... so be careful! Bear in mind that in memory joins like this will almost certainly be slower than when done in an engine designed to do it all the time – SQL Server. I say almost as you could point out there is overhead to hit a database, and if you have two collections containing only a few items each I might be persuaded it is quicker to do it in memory... but you see my point, non?!
Still, you’ve got to love LINQ.
Comments
- Anonymous
October 31, 2007
The comment has been removed - Anonymous
November 01, 2007
Yes, I should probably have qualified that comment a bit further! I take your points - and I have to say the indexing ideas are very interesting, and potentially fantastically useful in some situations - but I'd still advise caution. My main concern is in multi-user scenarios - e.g. ASP.NET. The idea of many users pulling back 200,000 objects each, joining them in memory concurrently, etc, etc... watch that scalability plunge! I'm keen that people understand what LINQ is doing - when it is optimising a SQL query, and when it is just operating in memory, as misunderstanding this could leave you with a headache. On the other hand, database servers (although incredibly advanced) are not magical - the fact is they have to do a lot of processing that can be simulated in .NET code... so if you write something approaching the same algorithms, but don't have the overhead of network or cross process communication, serialization, and so on, I can completely believe that in some scenarios you can beat a SQL server. We also shouldn't forget that there is no like-for-like comparison here. SQL can balance reads with writes, can scale up massively, has availability and recovery options, etc, etc... so as usual I suppose it is a case of picking your approach according to the best fit at the time!