Absence of evidence is not evidence of absence

Today, two more subtly incorrect myths about C#.

As you probably know, C# requires all local variables to be explicitly assigned before they are read, but assumes that all class instance field variables are initially assigned to default values. An explanation of why that is that I sometimes hear is "the compiler can easily prove that a local variable is not assigned, but it is much harder to prove that an instance field is not assigned. And since the class's default constructor automatically assigns all instance fields to default values, you don't need to do the analysis for fields."

Both statements are subtly incorrect.

The first statement is incorrect because the compiler in fact cannot and does not prove that a local variable is not assigned. Proving that is (1) impossible, and (2) does not give us any useful information we can act upon. It's impossible because proving that a given variable is assigned a value is equivalent to solving the Halting Problem:

int x;
if (/*condition requiring solution of the halting problem here*/) x = 10;
print(x);

If what we wanted to do was prove that x was unassigned then we would have to at compile time prove that the condition was false. Our compiler is not that sophisticated!

But the deeper point here is that we're not interested in proving for certain that x is unassigned. We're interested in proving for certain that x is assigned! If we can prove that for certain, then x is "definitely assigned". If we cannot prove that for certain then x is "not definitely assigned". We're only interested in "definitely unassigned" insofar as "definitely unassigned" is a stronger version of "not definitely assigned". If x is read from when it is "not definitely assigned", that's a bug.

That is, we're attempting to prove that x is assigned, and our failure to prove that at every point where it is read is what motivates the error. That failure could be because of a bona fide bug in your program, or it could be because our flow analyzer is extremely conservative. For example:

int x, y = 0;
if (0 * y == 0) x = 10;
print(x);

You and I know that x is definitely assigned, but in C# 3 the compiler is deliberately not smart enough to prove that. (Interestingly enough, it was smart enough in C# 2. I broke that to bring the compiler into line with the spec; being smarter but in violation of the spec is not necessarily a good thing.) 

This example again shows that we do not prove that x is unassigned; if we did prove that, then clearly our prover would contain an error, since you and I both know that x is definitely assigned. Rather, we fail to prove that x is assigned.

This is an interesting twist on the believers vs skeptics argument that goes like this: the skeptic says "there's no reliable evidence that bigfoot exists, therefore, bigfoot does not exist". The believer says "absence of reliable evidence is not itself evidence of absence; and yes, bigfoot does exist". In both cases, reasoning from a position of lacking reliable evidence is seldom good reasoning! But in our case, it is precisely because we lack reliable evidence that we are coming to the conclusion that we do not know enough to allow you to read from x.

(The relevant principle for tentatively concluding that bigfoot is mythical based on a lack of reliable evidence is "extraordinary claims require extraordinary evidence". It is reasonable to assume that an extraordinary claim is false until reliable evidence is produced. When overwhelmingly reliable evidence is produced of an extraordinary claim -- say, the extraordinary claim that time itself slows down when you move faster -- then it makes sense to believe the extraordinary claim. Overwhelming evidence has been provided for the theory of relativity, but not for the theory of bigfoot.)

The second myth is that the default constructor of a class initializes the fields to their default values. This can be shown to be false by several arguments.

First, a class need not have a default constructor, and yet its fields are always observed to be initially assigned. If there is no default constructor, then something else must be initializing the fields.

Second, even if a class does have a default constructor, there's no guarantee that it will be called. Some other constructor could be called.

Third, the field initializers of a class run before any constructor body runs, therefore it cannot be the constructor body that does the initialization; that would be wiping out the results of the field initializers.

Fourth, constructors can call other constructors; if each of those constructors was initializing the fields to zero, then that would be wasteful; we'd be unneccessarily re-initializing already-wiped-out fields.

What actually happens is that the CLI memory allocator guarantees that the memory allocated for a given class instance will be initialized to all zeros before the constructor is called. By the time the constructors run the object is already freshly zeroed out and ready to go.

Comments

  • Anonymous
    October 12, 2009
    Fifth, local variables are also always initialized by the CLI to default values, so nothing in particular follows from the statement "since the [something] automatically assigns all instance fields to default values". I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit. -- Eric

  • Anonymous
    October 12, 2009
    @Random832 I've always wondered: when you write .locals init in MSIL, CLI guarantees the locals to have been initialized upon method entry, so why does C# still mandate users to definitely assign a value to the locals? More often than not, you might want to omit writing the initialization because the default values are exactly what you want.

  • Anonymous
    October 12, 2009
    @raven I seem to recall reading that this was a decision made because the use of an uninitialized local variable is most often associated with a logic error, rather than a desire to use the default value. I wonder though if that means the same case can be made about fields in a class?

  • Anonymous
    October 12, 2009
    I think so, Greg. But field initialization is much more complicated, that I'm not sure it would be reasonably possible for the compiler to determine that a field is definitely assigned in all of the common usage cases. The most common cases of local variable initialization are understood by the compiler, but fields are assigned by field initializers and constructors, and the constructors may call other constructors in the same class or in other classes. It's way simpler just to rely on the CLR requirement that heap memory is zeroed before use.

  • Anonymous
    October 12, 2009
    @Eric: > I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit If you drop "init" from ".locals init", then locals won't be initialized, but then the resulting code will be non-verifiable (or at least ECMA CLI spec says so - I'm not sure if .NET will actually treat it as such, since it relaxes a few other overly stringent ECMA rules when it comes to verification).

  • Anonymous
    October 12, 2009
    The comment has been removed

  • Anonymous
    October 12, 2009
    The comment has been removed

  • Anonymous
    October 12, 2009
    @Denis - additionally, zero is always a valid enum value: TEST t =0; - compiles fine without any cast.

  • Anonymous
    October 12, 2009
    Re the ctor topic - additionally (although it is outside of the language) it is possible for no constructor to be invoked, for example via FormatterServices.GetUninitializedObject (which is used by DataContractSerializer in WCF, among other things).

  • Anonymous
    October 13, 2009
    @Eric > I seem to recall that the local initialization behaviour is configurable; you can turn it off if you don't want the perf hit. But currently the C# compiler always produces ".locals init" doesn't it? I couldn't find any switch in csc to configure the behavior for ignoring init. It won't cause much of any overhead to init the locals anyway, because CLR's JIT would treat zeroing out a local as dead code if the local is assigned with a new value before its first use; the new definition "kills" the old one. Dead code gets eliminated.

  • Anonymous
    October 13, 2009
    The comment has been removed

  • Anonymous
    October 13, 2009
    The comment has been removed

  • Anonymous
    October 13, 2009
    Personally, I care much more for the behavior to be well-defined; otherwise it may be as complex or as simple as is reasonable (which is of course a matter for discussion, but that's another discussion). The reason is obvious: if I write some code, I want to be able to validate it against the spec and know that it compiles on any C# compiler out there. When the compiler is allowed to be arbitrarily clever with no restrictions nor a definite spec, you run into a situation where the only code that's guaranteed to compile everywhere is the one that assumes that compiler is as dumb as possible (in our example, it would be requiring all local variables to be initialized, period). Which is quite useless.

  • Anonymous
    October 19, 2009
    Just a small point but "absence of evidence is not evidence of absence" is usually the claim the believer gives to the skeptic! skeptics make the point that absence of evidence IS a form of evidence of absence....at least when active attempts to find evidence have occurred. Your argument as used is correct but thats not exactly what the phrase is usually used to mean. usually the believer is arguing 'well just because we don't have evidence doesn't mean we are wrong' at which the response is of course 'well of course not....but it does mean you are more likely to be wrong!' sorry about being picky there.

  • Anonymous
    October 20, 2009
    What I am wondering is why local variables are not initialized automatically with their default value, once the compiler determines that they are not definitely assigned.  Would there be a performance penalty for such automatic assignment? The reason we require definite assignment is because failure to definitely assign a local is probably a bug. We do not want to detect and then silently ignore your bug! We tell you so that you can fix it. -- Eric