C# 3.0: LINQ, I'm not sure I like it that much
This is the my sixth post in the series of posts I am making on the new features of C#3.0. See the previous posts on var, extension method, lambda expressions, object-collection initializers and anonymous-types
I think the single biggest thing in C#3.0 is Language INtegrated Query or LINQ. On seeing all the other features of 3.0 (listed above) somehow I get a feel that they all came into the picture because Linq needs them to work well. This does not mean that these features do not find there usage elsewhere (they definitely do) but they look as if they are a part of the grand plan of Linq. CyrusN has some great blogs on Linq
I generally give strong opinions about whether I like a feature or not straightaway (no IMHO). But I am kind off divided on Linq. Lets first see what Linq is and how it works and then I'd go into why I like it and why I don't.
Using LINQ
Traditionally data has always been disjoint from code. A programming language would provide statements and expressions to work with data-types and each programmer would write specialized code in his own style to filter/manipulate data. Lets consider an array of Employees defined as below as our datasource. Note that the definition uses some of the new C#3.0 features including implicitly-types variables, anonymous-types, object-collection initializers and anonymous array declaratiom.
var employees = new []{ new { Name = "Arthur Dent", JobGrade = 3, JobTitle = "SDE", Salary = new { Base = 2000, Allowance = 1000 }}, new { Name = "Ford Prefect", JobGrade = 2, JobTitle = "SDE", Salary = new { Base = 5000, Allowance = 500 }}, new { Name = "Slartibartfast", JobGrade = 2, JobTitle = "SDET", Salary = new { Base = 3000, Allowance = 1000 }}, new { Name = "Zaphod Beeblebrox", JobGrade = 1, JobTitle = "SDE", Salary = new { Base = 6000, Allowance = 1000 }}, new { Name = "Trillian", JobGrade = 3, JobTitle = "SDET", Salary = new { Base = 12000, Allowance = 1000 }},};
In the old world you'd use custom functions to work on this array (data) to filter them based on some criteria. In C# 3.0 you can use extension method and lambda expressions to do this as
var highlyPaid = employees.Where(e => e.Salary.Base > 5000).Select(e => e.Name);
Effectively this returns the name of all employees whose base salary is over 5000. However in LINQ. You can convert this into query syntax which is similiar to SQL as in
// Query-1var highlyPaid = from e in employees where e.Salary.Base > 5000 select e.Name;
You can use other features like anonymous types to group data as well. In case you are interested to know the name of the person as well as his/her salary you'd write something like
// Query-2var highlyPaid = from e in employees where e.Salary.Base > 5000 select new { e.Name, e.Salary.Base };
There is something interesting here. In classic anonymous type declaration the declaration of the type is of the form new { name = value }. However in the above case we have not specified the name and yet you can do
foreach(var v in highlyPaid) Console.WriteLine(v.Name);
Here e.Name and e.Salary.Base is available as v.Name and v.Base. This works because the compiler knows the name of the fields in employee and generates the anonymous type to contain fields/properties matching the same name.
How LINQ works
C#3.0 does not put any restriction on the semantics of the query expressions. The language defines translation rules which maps each of the expressions into method invocation. So when the Linq expression Query-1 given above is compiled the compiler emits code to execute the following
var highlyPaid = employees.Where(e => e.Salary.Base > 5000).Select(e => e.Name);
The language defines that for Where clause the following will be called
delegate R Func<A,R>(A arg);class C<T> // This is the data type on which the query is run{ public C<T> Where(Func<T,bool> predicate); ....
}
Since this call is made by syntactic mapping the type on which the query is run is free to implement Where as a instance method, extension method or use the implementation of where in System.Query. If you open the assembly with some tool like reflector to see the generated code, you'll see that the whole query is just syntactic sugar to generate calls to these methods.
The formal translation rules and the recomended shape of a generic type that supports the query pattern is documented in the C#3.0 spec.
Why I like it
There are a lot of reasons to like LINQ.
- First of all it introduces a consistent and general way of querying for data, be it for databases, in-memory or XML. This will go a long way in increasing maintainability of code.
- Since there is no specified semantic and the user is free to implement the query pattern. This gives a lot of flexibility
- The fact that if the data source is a database DLinq will ensure that the query is executed remotely on the DB using SQL. This means the data comes after filtering on the server side and is not such that the whole data is pulled in and then filtered on client.
Why I do not like it
- This is another new way of doing things and will add to the burden of C#. I keep saying this over and over again as I strongly believe that the surface area of a language should be minimal and too much of change citing specific usage leads to trouble down the line. Soon the language becomes capable of doing everything in totally different ways and it becomes less discoverable and comes as surprises.
- The flexibility comes with a price. The same thing that can happen with operator overloading may happen with the query syntax as well. Someone can implement a custom Where for his data type which is non-standard and can take the code maintainer or a client of that code by surprise.
- I think that this might be used in small projects but on large data-driven application it'll rarely be used. People traditionally have separate data-tier with stored procedure and that works out really well both in terms of performance, maintainability and security.
- I have a little doubt about the security. In some blog I read that based on DB vendor the SQL statement might be generated and sent to the DB. Can this lead to some security holes? I am not too sure on this
Comments
- Anonymous
September 21, 2005
The comment has been removed - Anonymous
September 21, 2005
The comment has been removed - Anonymous
September 22, 2005
I've done plenty of projects involving OR mapping . You have to go through the pain of doing it without LINQ to see the beauty of it. Embedding SQL and generating it manually can be a nightmare. The type-safety LINQ provides is invaluable. The other features are also very useful in allowing refactoring in dimensions not possible before, enabling further cohesive, decoupled code. With lambda expressions, you can finally do true functional programming. For example, before you could not remove duplication resulting from similar method calls. - Anonymous
September 22, 2005
The comment has been removed - Anonymous
September 22, 2005
The comment has been removed - Anonymous
September 25, 2005
It sounds like you're arguing for stored procedures versus ad hoc SQL. That's fine for some applications, and still supported by DLINQ according to the docs I read. However, most people don't even use stored procedures, not all databases support them, and sometimes the stored procs can't even do queries where the expressions needed to query aren't known until runtime.
Besides that, LINQ is also invaluable for manipulating data after it comes back from the DB, or may not have come from the DB in the first place. - Anonymous
September 26, 2005
I am completely in favor of keeping the tier separation and concentrating all your database access code in your DAL. I just think it will be great to do in using LINQ instead of whatever you use now. It will also be great to be able to use the same principles and syntax for data manipulation on layers that are far from the database.