Type inference woes, part one

Article
05/24/2006

The C# compiler has a subtle violation of the specification which raises an interesting question for some of the new LINQ-related features. The specification for the ?: operator states the following:

The second and third operands of the ?: operator control the type of the conditional expression. Let X and Y be the types of the second and third operands. Then,

If X and Y are the same type, then this is the type of the conditional expression.
Otherwise, if an implicit conversion exists from X to Y, but not from Y to X, then Y is the type of the conditional expression.
Otherwise, if an implicit conversion exists from Y to X, but not from X to Y, then X is the type of the conditional expression.
Otherwise, no expression type can be determined, and a compile-time error occurs.

This makes a lot of sense. If you have

Giraffe g = (a > b) ? new Giraffe() : new Mammal();

then you ought to get a type error. The type of the ternary expression should be Mammal, not Giraffe, because Giraffe goes to Mammal but Mammal does not go to Giraffe.

If both types go to the other (which can happen if both types define an implicit conversion operator for the other) then we can't decide which one is better and give up.

If neither type goes to the other, then notice that we do not attempt to find the "nearest encompassing type". For example, if we had

Mammal m = (a > b) ? new Dog() : new Cat();

then we would again get a type error. We want to be able to have a unique type as the type of the ternary expression. We do not want to get into the business of saying "well, Dog and Cat are both subclasses of Mammal, but they are also both implementors of IHousepet, so which one is the "real" nearest encompassing type?" It's just too big a can of worms. We like the principle that the type of the expression must be the type of something in the expression.

So where's the spec violation? Well, according to the specification above, is this legal?

short s = 123;
int i = (a > b) ? 0 : s;

X is int, Y is short. Y goes to X, X does not go to Y, therefore X is the type of the expression, so this should work. But in fact, C# reports this as an error, because the rule C# actually implements is:

Let B and C be the second and third operands. Let X and Y be the types of the second and third operands. Then,

If X and Y are the same type, then this is the type of the conditional expression.
Otherwise, if an implicit conversion exists from B to Y, but not from C to X, then Y is the type of the conditional expression.
Otherwise, if an implicit conversion exists from C to X, but not from B to Y, then X is the type of the conditional expression.
Otherwise, no expression type can be determined, and a compile-time error occurs.

Since the types are not the same, and literal zero goes to short, and s goes to int, no expression type can be determined and we report an error.

One could make the argument that this is the better behaviour. It certainly does seem that in this case it's arguably ambiguous what the best type for the expression is since both halves can be converted to both int and short.

Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.

Unfortunately, fixing this would be a breaking change:

public delegate int D();
...
D d = (a > b) ? (D)(delegate() { return 1; }) : delegate() { return 2; };

Notice that the latter expression has no type at all. Yes, anonymous methods expressions are expressions without a type. If we used the "expression" version of the algorithm then the second expression is convertible to the type of the first, but the first is not convertible to the non-existant type of the second, so the type of the expression is the type of the first. If we used the spec-compliant "type" version then the second expression doesn't even have a type, so there's no type we can infer!

Next time I'll talk about an algorithm that might save the day here, and also discuss how we'll need to generalize this to make some of the LINQ-related language features work.

Comments

Anonymous
May 24, 2006
<blockquote>Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.</blockquote>

Someone should tell who-ever closed this (still lurking) bug:

http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=9810731d-021a-4de1-90aa-24fdc0d259b4
Anonymous
May 24, 2006
There's more than one bug closed as "Won't fix" despite not being a breaking change, based on that criteria.
Anonymous
May 24, 2006
I have no opinion whatsoever on the merits or drawbacks of "wont fix"ing the specific bug you mention. However I will note that making a change in the runtime behaviour of a framework function is fundamentally riskier than making a change in the compile time behaviour of the compiler. It is reasonably safe to say that no person is relying upon the compiler failing to compile code which under the specification really ought to compile. It's not like if we figure out a way to fix this that suddenly a fortune 500 company's web site will stop working. With runtime behaviour changes it is extremely difficult to know who is relying on "incorrect" behaviour continuing to be incorrect.
Anonymous
May 25, 2006
So why do anonymous method expressions have no type? Why not give them a type like "() -> int", which could be implicitly convertible to D?
Anonymous
May 25, 2006
First off, we don't know the return type. I'll be discussing problems with inferring the return type in future posts. Second, how would you represent it in metadata? Third, how would you name this type so that you could use it in an "is" expression? Fourth, what Type object would typeof return?

Giving a thing a type is surprisingly tricky. We want the C# type system to map reasonably well to the .NET CLR type system, so things like anonymous methods, lambdas, and method groups are considered to have no type.
Anonymous
May 26, 2006
Surely since the specification describes only the behavior for things where asking about "the type of this expression" makes sense, what you do with these untyped expressions is already outside the scope of the spec?

It seems like it would be perfectly possible to implement the specced behavior in any case where both expressions have types, but do what you do now in the case where one or both don't. And then in the next release of the spec work to get it amended to cover this case too.

Stuart.
Anonymous
May 26, 2006
The comment has been removed

Partager via

Type inference woes, part one

Comments

Ressources supplémentaires