The Root of All Evil, Part Two

There were a number of interesting comments to yesterday's posting about premature optimization.

Several readers pointed out that there are various options we could take:

  1. Do nothing. Maintain backwards compatibility and spec incompatibility.
  2. Change the spec to exactly match the current implementation.
  3. Change the spec and the implementation.
  4. Make the weird behaviour a warning now and an error later.
  5. Fix the implementation to match the specification, but add another compiler switch to go back to the previous behaviour.

 

These all have various pros and cons, and even for a relatively trivial spec incompatibility such as this one, we think very hard about what the right thing to do is.

Option #2 sounded good, and I actually was an advocate of it for a few hours yesterday, right up until I dug into the optimization and discovered exactly what it was doing. If we were to change the specification to exactly match the behaviour, then the specification would read like this:

Implicit Enumeration Conversions

There is an implicit conversion from any "magic zero" to any enumerated type. "Magic zero" is recursively defined as follows:

  • the integer literal zero, or
  • any magic zero + - | ^ any compile-time integer constant expression equal to zero, or
  • any compile-time integer expression * & any magic zero, or
  • any magic zero / % any non-zero compile-time constant integer expression, or
  • any magic zero * & any non-zero compile-time constant integer expression

 

This means for example, that (8*(0+(7-7))) is a magic zero, but (8*((7-7)+0) is not.

I think we can all agree that we don't want this nonsense in the specification. Changing the spec to exactly match the buggy behaviour leads to a crazy spec.

We could change both the specification and the implementation. We could, for instance, say that any compile-time constant integer zero is implicitly convertible to any enumerated type. This means that nonsense like (8*(0+(7-7))) is assignable to enum, but at least it also means that (8*((7-7)+0) is too.

Adding a new warning is a tempting idea, and we might end up doing it. Doing so means that it is easier to turn it into an error in a future release. New warnings, however, are expensive. They add new code paths to test, we have to test all the scenarios to make sure the warning always appears when it should, warnings have to be translated into a dozen languages, they have to be documented, the documentation has to be translated... it's not cheap.

Adding a compiler switch is also a possibility, but also expensive. The more switches there are, again, more code paths, more tests, bigger testing matrix, the switch has to be documented, and so on.

So now you see why premature optimization is the root of all evil. I've spent the better part of two days wrestling with this thing. Why? Because suppose one of these guys is in a lambda expression that needs to be translated into an expression tree. I want to make for darn sure that the translation is correct, and the more "optimizations" that happen before that transformation, the more the transformation is likely to be not as the user intended! But I can't remove the optimizations without breaking the type system!

UPDATE: The thrilling conclusion is that we have made all constant-according-to-the-spec zero integers convertible to any enum. This is a spec violation, but at least it is a consistent spec violation. In doing so I also introduced a bug, which I believe did ship to customers but which we have since fixed, whereby some "non-integer zeros" (like default instances of structs, or null references) were convertible to any enum. I apologize for the error.

I also challenged you folks to come up with a situation where this premature optimization can screw up definite assignment checking. Reader Carlos found two.

int aa; int bb = 12; int cc;
int dd = (aa * 0) + 12;
if (bb * 0 == 0) cc = 123;
Console.WriteLine(cc);

According to the spec, aa should be flagged as "used before definitely assigned" (even though really it doesn't matter), but because the premature optimization throws away the aa, the flow checker never sees it. Also according to the spec, cc should be flagged as "used before definitely assigned" (even though it always is assigned), but because the optimization throws away the bb, the flow checker does not realize that the conditional is not a compile-time-constant.

The important thing here is not that the code gets it wrong. Indeed, one could make the argument that it gets it right. The problem is that it gets it right for the wrong reasons – it gets it right by accident, in violation of the specification. The definite assignment specification isn't supposed to be perfect, it's supposed to be practical and understandable and predictable and implementable. Undocumented extensions to it make it hard to know whether any two implementations of the spec will agree.

Comments

  • Anonymous
    March 30, 2006
    Could you explain why it's so nonsensical to allow every compile time constant integer to be implicitly convertable to any enumerated type?

    Thanks and welcome back :)

  • Anonymous
    March 30, 2006
    Given what you've said, I'd say changing the spec and the implementation makes the most sense to me. When it comes to a matter of constants, I don't see any good reason why the order of operations should matter (given commutative operations, of course). Maybe it would matter with function calls, but with constants?

    After all, why does it make sense that 0 | E.X | 0 is convertible to an enumeration, but (8*(0+(7-7))) isn't? Sure, it looks silly, but that's not really for a spec to judge.

  • Anonymous
    March 31, 2006
    I think that an important question to ask when considering a change like this is "how many people have code that this will break?"

    My guess is that very few people rely on this behavior. I would correct it and put a notice in the "breaking changes" of the README.

  • Anonymous
    April 01, 2006
    Well, at the very least, the thousands of people who compile the base class library. How many "real" customers are affected, of course we do not know.

    The problem is that this is just one example of a large class of premature optimization bugs which have turned spec errors into sensible behaviour. This wouldn't be a problem if Expression Trees weren't coming down the pike -- it's going to be a major piece of work to maintain all these non-error cases and do correct expression tree construction.

  • Anonymous
    December 14, 2007
    You never answered Kurtiss' question. I thought the same thing while I was reading the post; doesn't that actually make sense?

  • Anonymous
    December 14, 2007
    The choice we eventually went with was to violate the specification in a sensible manner. All constant zeroes now go to any enum. This is a breaking change, since overload resolution on void M(int x) {} vs void M(MyEnum x) {} is now ambiguous on M(ConstantZero) when it did not used to be.  This breaking change has caused at least one customer's code to break, which caused them to complain bitterly to Microsoft because we made them lose money. (And who knows how many people are broken who haven't complained?) Changing the rule to "any integer constant goes to any enum" would be a MASSIVE breaking change that would affect thousands of customers, so we will not do that. If the question is why the rule is like this in the first place -- well, the fact that enumerated types are just fancy integers is in my opinion deeply unfortunate. It's a leaky abstraction. I would rather have seen two kinds of types -- enumerated types, which are strictly restricted to their set of values and have no relationship whatsoever to integers, and flags, which are fancy dress for integers with some added type safety. Clearly the only reason to make zero go to any enum is so that a flag enum can always be initialized to empty -- there's no reason why zero ought to go to a non-flag enum.

  • Anonymous
    December 14, 2007
    That makes sense. I wasn't clear on what the final resolution was from the post itself; thanks for clearing that up. And I absolutely agree with you about enumerated types. In a static type system it makes no sense that I have to check whether a variable of a particular type is actually belongs in that type! "Deeply unfortunate" is an understatement. Unfortunately I suppose there's no going back now.

  • Anonymous
    February 02, 2012
    It seems that something wrong with the parenthesis in the (8*((7-7)+0)  expression.