Three Umpires

Three baseball umpires are having lunch together. The first umpire says "Well, a lot of them are balls, and a lot of them are strikes, but I always calls 'em as I sees 'em."

The second umpire says "Hmph. I calls 'em as they are."

The third umpire slowly looks at his two colleagues and declares "They ain't nothin' until I calls 'em."


Those of you unfamiliar with the bizarre rules of baseball might need a brief primer. Suppose the pitcher throws a pitch and the batter swings and misses. Such a failure to strike the ball is, bizarrely enough, called a "strike", and counts against the batter; three strikes and the batter is out. But what if the batter fails to swing at all? If the umpire decides that the pitch was "inside the strike zone" then the pitch counts as a strike against the batter. If the pitch was outside the strike zone then the pitch counts as a "ball" (another bizarre and confusing name; obviously the object pitched is also called a ball). If the pitcher pitches four "balls" before the batter accumulates three "strikes" then the batter gets to "walk" to first base for free. (At least the walk is sensibly named.)

Formally, the strike zone is reasonably well-defined by the rules (though as the Wikipedia article linked to above indicates, there are some subtle points left out of the definition.) But the formal definition is actually irrelevant; the rules of baseball also state that a strike is any pitch that the umpire says is a strike. Umpires are given wide lattitude to declare what is a strike, and there are no appeals allowed.

And hence the fundamental disagreement between the three umpires. The first umpire believes that whether a pitch was in the strike zone or not is a fact about an objective reality, and that the call is a sometimes-imperfect subjective judgment about that reality. The second umpire seems to be basically agreeing with the objective, materialist stance of the first umpire, and simply bragging about having 100% accuracy in judgment. The third umpire's position is radically different from the first two: that the rules of baseball say that regardless of the objective reality of the path of the baseball, what makes a pitch into a ball or a strike is the umpire's call, no more, no less.

I think about the three umpires a lot. The C# language has a clear and mostly unambiguous definition of "the strike zone"; the specification should in theory allow us to classify any finite set of finite strings of text into either "a legal C# program" or "not a legal C# program". As a spec author, I take the third umpire's position: what the spec says is the definition of what is legal, end of story. But as an all-too fallible compiler writer, I take the first umpire's position: the compiler calls 'em as it sees 'em. Sometimes an illegal program accidentally (or deliberately; we implement a small number of extensions to the formal C# language) makes it through the compiler. And sometimes a legal program is incorrectly flagged as an error, or cannot be successfully compiled because it causes the compiler to run out of stack space or some other resource. (Also, though I believe that the compiler always in theory terminates, there are ways to build short C# programs that take exponentially long to analyze, making the compiler a sometimes impractical tool for deciding correctness.) But we calls 'em as we sees 'em, and if we get it wrong, then that's a bug.

But as a practical matter for our customers, the compiler is more like the third umpire: the arbiter of correctness, with no appeal. And of course I haven't even begun to consider the runtime aspects of correctness! Not only should the compiler decide what programs are legal, it should also generate correct code for every legal program. And again, the code generator plus the CLR's verifier and jitter act like our third umpire; the de facto arbiter of what "the right behaviour" actually is.

Comments

  • Anonymous
    November 09, 2009
    I'm sure I speak for others (as well as myself) when I ask to see examples of the Microsoft C# compiler's extensions to the language, as well as one or two small programs that take exponentially long to analyze. I gave an example of a program that is NP-HARD to analyze a while back. The idea is that if you have M(x=>M(y=>M(z=>M... that is n deep, and there are m overloads of M, then we do m^n lambda bindings. There are a number of features that are arguably extensions to the language. Most of them are bugs that have become enshrined, like the fact that constant, not literal, zero goes to any enum, or that type analysis of conditional expressions is richer than what the spec describes. But we also have some bona fide extensions to the language that almost no one knows about. (For good reason; they are designed to be for extremely obscure scenarios.) It's a little-known fact that C# supports C-style variadic methods and first-class strongly-typed references. -- Eric

  • Anonymous
    November 09, 2009
    There's an annotated version of Ecma-335 (CLI) detailing implementation-defined characteristics, unimplemented features, and extensions of .NET related to the base CLI standard (http://msdn.microsoft.com/en-us/netframework/aa569283.aspx, look under "Microsoft Implementation Specific Versions"), but there's none for Ecma-334 (C#). It would be interesting to have that as well.

  • Anonymous
    November 09, 2009
    That link you mentioned speaks about TypedReference. I've seen it before, but I don't get the point. Isn't it the same as using object for the reference and object.GetType() to get its type? No. A TypedReference is a reference to a variable, not to an object of reference type. It's the difference between "M(x)" and "M(ref x)". Basically, a TypedReference can be thought of as a void* -- a pointer to some storage location of unknown type -- plus a Type object describing the type that can be stored there. (That's not exactly how it is implemented, since a void* would require pinning the location. But conceptually, that's what it is; a reference to storage plus a type.) As I'm sure you can imagine based on that description, TypedReference is a Very Special Type. While a TypedReference object is "in flight" in a variadic method call, clearly the storage that it refers to cannot be garbage collected safely. Therefore the GC needs to have special knowledge of when typed references to locations are valid. The C# compiler has a great many special rules in it to prevent misuse of TypedReference. Some day I'll blog about them. -- Eric

  • Anonymous
    November 09, 2009
    So is the C# TypedReference the same as the C++/CLI "tracking reference" denoted by T%? If not, how do they differ? No idea. I've never used C++/CLI. -- Eric

  • Anonymous
    November 09, 2009
    Are there any extensions to the spec other than TypedReference and varargs? Yep. -- Eric

  • Anonymous
    November 09, 2009
    Hi, there, Thanks for the time that you put into keeping the blog going. It is very informative and I really appreciate it! Thanks. J

  • Anonymous
    November 10, 2009
    There's one problem here: situations where the MS compiler has taken a particular decision, but it would be entirely correct to take another path. For example, take anonymous types. If you have two expressions: new { Name="fred", Age=10 } and new { Name="george", Age=10.5 } they will (under the MS compiler) use the same generic type, with different type arguments (<string, int> and <string,double>). Is that deemed to be the correct behaviour, or merely a correct behaviour? If a different compiler were to create two completely separate types, would that be "wrong"? I realise this is beyond the legality of the program code but I think it's important to recognise grey areas. Fortunately most of these are pretty well hidden, but I'm not looking forward to finding a situation where some library author has made an assumption based on what the MS compiler generates... (The expression tree stuff is another good example of this - the specification refers to another document, but I'm not sure that that document exists publicly.) Jon Indeed. In general, implementors have lattitude to innovate, so long as the language semantics are preserved. In fact, the spec calls out specifically a few areas where implementations are given wide lattitude. For example, the bit about iterator blocks merely suggests that a state machine built out of switch statements is a reasonable approach; nothing in the spec requires that the implementation use a state machine implementation if, say, coroutines are available on an implementation's runtime. -- Eric

  • Anonymous
    November 10, 2009
    One could argue that the spec is a contract and the MS C# compiler is "a" implementation of that contract on the .NET platform, another implementor (the Mono guys maybe) could take the same contract and implement is differently. As long as the client (some program) adheres to the contract, the program would compile just fine, but the runtime behavior is upto the implementation/platform on which the bits are running.

  • Anonymous
    November 12, 2009
    The comment has been removed

  • Anonymous
    November 12, 2009
    Now that I understand what TypedReference is, why does it exist if it's so well hidden? The compiler has special rules for this type, but almost nobody uses it. Why bother? Because we need to be able to interoperate with C-style variadic methods in order for the framework to have a decent interop story with legacy code. -- Eric


Could you tell us more about those other extensions to the spec? Sure. Keep reading; I'm sure I'll get to the ones I haven't already discussed some day. -- Eric

"if, say, coroutines are available on an implementation's runtime." Wouldn't they need to be available in the CLR for that? The CLR is our implementation's runtime. Suppose you were implementing a C# compiler targetting a runtime you'd written yourself; in that case, maybe you implemented coroutines in your runtime. In that case, you could implement iterator blocks in your C# compiler to use coroutines instead of generating a state machine. -- Eric

  • Anonymous
    November 13, 2009
    But the C# compiler - 'mine' or yours - produces IL. IL that should, according to the spec, be able to run on your runtime. So I can't both rely on my runtime's coroutines and comply with the spec, can I? Or does the spec not specify IL at all and say simply what the program should do when run?