Method Type Inference Changes, Part One
I want to start this by discussing the purpose of method type inference, and clearing up some potential misunderstandings about type inference errors.
First off though, a brief note on nomenclature. Throughout this series when I say "type inference" I mean "method type inference", not any of the other forms of type inference we have in C# 3.0. (Implicitly typed locals, implicitly typed arrays, and so on.)
The purpose of type inference is to allow generic methods to be called without explicitly specifying the generic types. For example, if you have a generic method
static T Largest<T>(IEnumerable<T> list) where T : IComparable<T> { ... }
then it would be nice to be able to say
decimal maxPrice = Largest(prices);
without having to say redundant information:
decimal maxPrice = Largest<decimal>(prices);
This was useful in C# 2.0, but with the advent of extension methods and query comprehensions in C# 3.0, it is downright vital. If we made Largest into an extension method then clearly you want to be able to say
from customer in customers
from invoices in customers.invoices
select invoices.prices.Largest();
and not have to insert this ugly mechanism information into your lovely semantic query:
from customer in customers
from invoices in customers.invoices
select invoices.prices.Largest<decimal>();
When you call a method without an explicit argument list, overload resolution must figure out which method you meant to call. The way type inference works with overload resolution is that all methods of that name are put into a pool called the candidate set. If any of them are generic, we run a type inference algorithm to see if the generic type arguments can be inferred for the method. If they can, then the constructed generic method stays in the candidate set; if inference fails then it is removed. Then all methods which are inapplicable -- that is, the arguments cannot be converted to the parameter types -- are removed. Of the remaining applicable candidates, either a unique best candidate is chosen. If a unique best candidate cannot be identified then overload resolution fails.
Note that type inference failure is not actually an error in of itself. However, type inference failure (or success!) might cause real errors downstream in a number of ways. For example:
- if type inference fails and the candidate set becomes empty as a result, then overload resolution fails
- if type inference fails when the user intended the generic method to be the chosen one, and the remaining candidates have no unique best member, then overload resolution fails
- if type inference succeeds when it wasn't expected to then there might be more methods in the candidate set than expected. If the inferred method ties for "bestness" with another candidate then overload resolution fails. (The tiebreaker rules in overload resolution are designed to avoid this scenario, but it is still possible in contrived situations.)
This then brings up the question of what error message to display when overload resolution fails. In C# 2.0 we have a heuristic which tries to detect when overload resolution failed because of the first case (which is the most common of the three cases.) In those cases it gives an error message that says "type inference failed" rather than "overload resolution failed". Even though the actual error as far as the language specification is concerned is that the candidate set contained no unique best applicable member, the user typically thinks of the cause of the error as being type inference failure in many cases, so we try to do a good job of reporting that.
It works pretty well in C# 2.0, but we struggled with this a lot in C# 3.0. Suppose you end up in a situation where you have a query:
IEnumerable<Customer> customers = ...;
var firstNames = from c in customers select c.FristName;
Oops. That is going to be translated into customers.Select(c=>c.FristName). There are no methods named "Select" on IEnumerable<Customer>, we look for extension methods and find Enumerable.Select<T, R>(this IEnumerable<T> list, Func<T, R> selector). Type inference infers T as Customer, but cannot figure out what R is because Customer does not have a property FristName. Therefore type inference fails, and therefore overload resolution fails, and therefore our heuristic takes over and gives the error that type inference failed on Select.
I think you would agree that the user does not think of the error in this program as being a failure of overload resolution or type inference; they think that the failure is that there's a typo in one clause of the query. Wes did quite a bit of work on the heuristics to go back one step further, and report why type inference failed; because the body of the lambda could not be successfully bound.
So that's one change to type inference between C# 2.0 and C# 3.0, but I seem to have gotten ahead of myself somewhat. Next time we'll take a closer look at how the mechanism of type inference works in C# 2.0.
Comments
Anonymous
June 17, 2008
An implementation question: Quoting you: "Type inference infers T as Customer, but cannot figure out what R is because Customer does not have a property FristName. Therefore type inference fails, ...". It seems that at this point you can say: "Compilation fails". Why do you need to see it as an inference failure and then go back to find the "user friendly" cause? It looks correct to stop processing in a case a member was not found in a type and report an error. In a more general case you will probably try to infer T as other type and if it's possible check whether it has "FristName" property. So the generic error I would expect here is something like: "Neither of types <Customer,'other inferred types if exist'> has a property named 'FirstName'".Anonymous
June 17, 2008
At that point, the method has not been selected yet, so the type T as Customer. i.e., there might be another Select in the candidate pool, which would make that code valid - The contents of the lambda cannot be compiled (or even generate an error) until the method is selected; The method cannot be selected because the contents cannot compile. This recursion is exactly the problem that Eric is refering to.Anonymous
June 18, 2008
The comment has been removedAnonymous
June 18, 2008
Indeed, I am glossing over a number of important steps here. Also, as a result of a bug that I think was one of yours, we also check to see whether any of the inferred types would be illegal because they are not accessible in the caller's context.Anonymous
June 18, 2008
The user would be correct to expect as clear an error message as possible, in that example. Because missing property on the type is the real problem. I appreciate the work gone/going into improving error messages. Now, only if I could convince my ridiculous management to move to C#3.0.Anonymous
June 23, 2008
To configurator: Basically the inference algorithm can be seen as a series of: <hypothesis> -> <test> -> … As the implementation is described, it assumes some <hypothesis> (T is Customer), then makes a <test> (does Customer has a property named 'FristName'?). The <test> fails and invalidates <hypothesis>. Since the user is interested which <test> failed, the implementation then uses some heuristic to find the <test>. My point is that it is always known. See again Eric's explanation: "if inference fails then it [the candidate] is removed". In fact what fails is the <test>! So there can be a situation where we need to remove unneeded <test>s in our error message, but never - to find them again.Anonymous
June 26, 2008
The comment has been removedAnonymous
June 27, 2008
Reread part zero of this series. Your question is the whole reason I'm doing this series in the first place.Anonymous
June 29, 2008
Thanks for pointing that out, i'll get back to part zero at once :-)