Type inference woes, part four
Last time in this series I discussed how we are probably going to identify a "best" type from a set of expressions. Clearly we need to solve this problem for features such as implicitly typed arrays. We are also considering extending this algorithm to affect type inference of generic methods.
As always, the reason we're considering this is emminently practical; we try hard to not add new language semantics just for the heck of it, but rather in response to specific user problems. Before I get into the motivating problem, let me describe a simpler case that might look familiar:
void M<T>(T t1, T t2){}
...
M(myShort, 0);
In our currently shipped C# compiler, the type inference algorithm insists that all inferences be 100% completely consistent. T cannot be both int and short, so inference fails in this case.
It might be kind of nice to say, you know what, the "best type" algorithm would tell us that short is best, so why don't we use it?
Of course, there are some situations where that stops working. We'd still need this to fail, for instance:
void M<T>(List<T> t1, List<T> t2){}
...
M(myListOfShorts, myListOfInts);
Generic types are neither covariant nor contravariant, so neither choice works in this case. However, it does work in this case:
void M<T>(Func<T> t1, Func<T> t2){}
...
M(()=>myInt, ()=>myShort);
because a lambda which returns a short is convertible to a delegate which returns an int, so inference to int could succeed.
The motivating example for all of this is the join scenario. For those of you who are not database wonks, joining is one of the fundamental operations on tabular data. You typically have two tables with some relationship between them. For instance, you might have a list of customers, where each customer has a customer id, and a list of orders where each order has an associated customer id. You "join" the tables by producing a new table of records associating each customer with their orders. Maybe the new table is the customer name and the amount of each of their orders.
To represent a generic join operation as a method we need two tables of records, a key extractor for each table, and a projector from the joined records to the output records. The caller might look something like:
var results = Join(
myCustomers,
myOrders,
customer=>customer.Id,
order=>order.CustomerId,
(customer, order)=>new {customer.Name, order.Price});
The fully generic method might look something like:
IEnumerable<RES> Join<REC1, REC2, KEY, RES>(
IEnumerable<REC1> rec1s,
IEnumerable<REC2> rec2s,
Func<REC1, KEY> rec1KeyExtractor,
Func<REC2, KEY> rec2KeyExtractor,
Func<REC1, REC2, RES> projector) { ... }
So now perhaps you see the problem. What if the customer key type is int but the order table's customer key type is Nullable<int> ? We would like to infer that Nullable<int> is the better choice for KEY but today the type inference algorithm insists on 100% consistency.
As you can see, this is rapidly becoming complex. We would like to come up with an inference algorithm that solves this problem, but at the same time can be clearly described in the standard and understood by mortals. (Where by "understood by mortals" I mean "such that Eric has some chance of implementing the algorithm correctly"!) Any thoughts or comments you might have would be appreciated; the language design committee will be debating these issues this week and so this is the time to give feedback about these things.
Comments
- Anonymous
July 18, 2006
PingBack from http://microsoft.wagalulu.com/2006/07/18/type-inference-woes-part-four/ - Anonymous
July 19, 2006
I'm probably missing something important here, but isn't an int implicitly convertible to a Nullable<int>? In which case, couldn't the KEY in this example be inferred using one of the algorithms in part 3?
I think I remember this example coming up earlier and you asked if the algorithm should infer a Nullable<int>, and I'd say that makes perfect sense. If you were trying to join a Nullable<short> and an int, then it would have to produce an error, but in the case of a Nullable<int> and anything implicitly convertible to an int, it makes sense that Nullable<int> should be the inferred type.
Further to that, since this example really applies primarily (I'd assume) to LINQ and is really a DB-oriented concept, it's probably going to end up getting used most by the database wonks like myself, who will expect it to work the same as a SQL join, which could include null values if it were an outer join. I think the C# team has also made a few other decisions which favour SQL behaviour although I can't remember offhand what they are.
Does inferring a Nullable<int> from the example here conflict with any of the inferrence algorithms you put forth before? - Anonymous
July 19, 2006
No, there's no conflict. The problem lies in specifying exactly what the inference algorithm should be.
In my next post I'll describe some of the weird issues you run into when you start applying unification algorithms during type inference on lambdas. - Anonymous
July 20, 2006
The comment has been removed - Anonymous
July 20, 2006
The comment has been removed - Anonymous
July 20, 2006
The comment has been removed - Anonymous
July 21, 2006
The comment has been removed - Anonymous
July 21, 2006
My version isn't a filter on the cross product if the type of the parameter in question is Expression<Func<Customer,Order,bool>> (or whatever the precise syntax is)...