Поделиться через


null is not false

The way you typically represent a "missing" or "invalid" value in C# is to use the "null" value of the type. Every reference type has a "null" value; that is, the reference that does not actually refer to anything. And every "normal" value type has a corresponding "nullable" value type which has a null value.

The way these concepts are implemented is completely different. A reference is typically implemented behind the scenes as a 32 or 64 bit number. As we've discussed previously in this space, that number should logically be treated as an "opaque" handle that only the garbage collector knows about, but in practice that number is the offset into the virtual memory space of the process that the referred-to object lives at, inside the managed heap. The number zero is reserved as the representation of null because the operating system reserves the first few pages of virtual memory as invalid, always. There is no chance that by some accident, the zero address is going to be a valid address in the heap.

By contrast, a nullable value type is simply an instance of the value type plus a Boolean that indicates whether the value is to be treated as a value, or as null. It's just a syntactic sugar for passing around a flag. This is because value types need not have any "special" value that has no other meaning; a byte has 256 possible values and every one of them is valid, so a nullable byte has to have some additional storage.

Some languages allow null values of value types or reference types, or both, to be implicitly treated as Booleans. In C, you can say:

int* x = whatever();
if (x) ...

and that is treated as if you'd said "if (x != null)". And similarly for nullable value types; in some languages a null value type is implicitly treated as "false".

The designers of C# considered those features and rejected them. First, because treating references or nullable value types as Booleans is a confusing idiom and a potential rich source of bugs. And second, because semantically it seems presumptuous to automatically translate null -- which should mean "this value is missing" or "this value is unknown" -- to "this value is logically false".

In particular, we want to treat nullable bools as having three states: true, false and null, and not as having three states: true, false and different-kind-of-false. Treating null nullable Booleans as false leads to a number of oddities. Suppose we did, and suppose x is a nullable bool that is equal to null:

if (x)
Foo();
if (!x)
Bar();

Neither Foo nor Bar is executed because "not null" is of course also null. (The answer to "what is the opposite of this unknown value?" is "an unknown value".) Does it not seem strange that x and !x are both treated as false? Similarly, if (x | !x) would also be treated as false, which also seems bizarre.

The solution to the problem of these oddities is to avoid the problem in the first place, and not make nulls behave as though they were false.

Next time we'll look at a different aspect of truth-determining: just what is up with those "true" and "false" user-defined operators?

Comments

  • Anonymous
    March 26, 2012
    Doesn't a byte have 256 possible values (with 255 being the highest of them)?  Or have I been doing it wrong.

  • Anonymous
    March 26, 2012
    Enumerable.All has a similar issue, except its tri-state is true, false, and did-not-check-the-predicate-true, which leads to this interesting contradiction: var numbers = new int[0]; bool all = numbers.All(i => false); if (all)     throw new Exception("All items match an unmatchable predicate."); Empty sequences always matching the predicate has bitten me on more than one occasion.

  • Anonymous
    March 26, 2012
    @Chris B That's expected logically though. If you have an empty sequence, it's true that all the items in it match any predicate. In mathematics, this is called a "vacuous statement" (e.g., all members of the empty set are equal to 3). I wouldn't call that three-valued logic, as it's the expected behavior of regular boolean logic.

  • Anonymous
    March 26, 2012
    Chris B: That's not the same at all. In fact, it's completely correct: all elements in the array match the predicate. All items are also even, and odd.

  • Anonymous
    March 26, 2012
    Mr Lippert - On the subject of nulls, why the 'struct' constraint on Nullable?  I appreciate references already have a null representation, but would there be any harm in additionally allowing Nullable<ClassName> as an alternative?  I ask because it seems it might be useful sometimes to return a Nullable value to indicate a potential return type of 'not available'.  However, this does not generalise to classes.  What were the factors in making the design decision? For example, I would like to do something like this, except allow class values as well as structs: interface ICollection<TKey, TValue> where TValue : struct  {      TValue? this[TKey key]      {          get;      }  } Thank-you -

  • Anonymous
    March 26, 2012
    @Crosbie or just simply allow T? for non-constrained generic parameters [which will become int? for int, int? for int?, and string for string]

  • Anonymous
    March 26, 2012
    So, you've explained why implicit boolean treatment of T? is a bad idea.  But you haven't really given any rationale against implicit boolean treatment of T*.  It might also be worth mentioning that C uses the integer literal 0 to mean "null", and so the longhand of int* p; if (p) is if (p != 0).  So it seems more likely that rejection of if (p) in C# is related to if (i) for integral i, which likewise in C means if (i != 0) and is rejected in C#, rather than related to System.Nullable, which didn't even exist yet.

  • Anonymous
    March 26, 2012
    @Ben: C doesn't have the concept of NULL, or BOOL even. An if statement just means "Is the evaluation of my expression non-zero?" So, it makes sense that    //C    if((int*)0) { ... } would be legal, since 0 is 0, no matter what form it takes. In C#, boolean is a distinct type from integer. If doesn't mean "is my expression non-zero", it means "is my expression non-false". As int* is not convertible to boolean, we cannot adequately answer that question. (It could be argued that int* /should/ be convertible to boolean, but as I try to avoid using pointers as much as possible, I have no opinion)

  • Anonymous
    March 26, 2012
    @Crosbie: Consider this:    int?? myInt = null;    Console.WriteLine(myInt == null); //??? What should that output? true, because the inner int? is null, or false because the outer int? has a value?

  • Anonymous
    March 26, 2012
    I actually enjoy using this in JavaScript and I think it's really handy! I tend to think about it as a contextual operations that checks if a value is null or not, rather than a direct mapping between true-false/null-not null and unfortunately 0-1. In JS context, I don't see it awkward any more to think that !(null) != null and in that realm I don't have to worry about reference vs. value types as well. That has been said, I don't see my self comfortable using this when I put my C# hat and I strongly agree that this does not match the style of C# or the culture of their audience. The nutshell of my argument is although language features are transferable, what might look awkward (or wrong if there is such) in one language, could be justifiable in other.

  • Anonymous
    March 26, 2012
    Good post, looking forward for the next one about "true" and "false", I always wanted to know what useful cases they can enable. In the meanwhile, there's a typo: "[...] In particular, we want to treat nullable bools has having three states [...]" That "has" should be "as".

  • Anonymous
    March 26, 2012
    '(The answer to "what is the opposite of this unknown value?" is "an unknown value".) Does it not seem strange that x and !x are both treated as false?' Your decision in parentheses of how to define '!' leads the the apparent contradiction in the second sentence. However, if we take an actual language which actually implicitly treats null as false (e.g. Python), we actually find a different definition. When x if null, then !x is treated as !false, which is true. (Or, equivalently in Python syntax: not None == True)

  • Anonymous
    March 26, 2012
    The comment has been removed

  • Anonymous
    March 26, 2012
    "treating references or nullable value types as null is a confusing idiom" Shouldn't this be: "treating references or nullable value types as Booleans is a confusing idiom"

  • Anonymous
    March 26, 2012
    The comment has been removed

  • Anonymous
    March 26, 2012
    Slightly off-topic, but still relevant: I'd like to a see a NotNullable<> keyword (perhaps with a ! suffix shorthand?) You would apply this to references to indicate that they can never be null. You can never assign null to a NotNullable<>, and you can only assign a nullable value by using a cast (which would throw if the source value was null). This would save a lot of null checking, and reduce the scope for any desire to treat null as false in the first place. It might need a bit of route analysis to cover some cases - but no more than you already do for other things, I think.

  • Anonymous
    March 27, 2012
    Phil Nash: What would you get when you create a "new NotNullable<string>[100]"? It seems hard to imagine how you wouldn't end up with 100 null NotNullable<string> objects, which sort of defeats the purpose of having a NotNullable type to begin with.

  • Anonymous
    March 27, 2012
    The comment has been removed

  • Anonymous
    March 27, 2012
    @AG: C# is not C/++. Read my post (on page 1) for more details.

  • Anonymous
    March 27, 2012
    @Gabe: you can't create that if you don't assign values to it immediately, just as you can't create an int without a value. If you don't want that restriction, you can default strings to the empty string, for example.

  • Anonymous
    March 27, 2012
    @Julian I agree, I also had to read the paragraph several times to understand what Eric was going for - presumably because of my knowledge with python (well and scheme to some lesser degree). Personally treating empty collections and null as implicitly null is generally the behavior one would expect and makes for nicer code. I don't like it for integers though - imho if x % 10 == 0 is clearer than if not (x % 10) (also I've no idea whether I actually need the parens there) @Gabe Well C++ solves that problem by demanding a default constructor.

  • Anonymous
    March 27, 2012
    @Mike Caron: My point was that since it could have been implemented in c/c++ without contradictions (like x | !x evaluating to false) it could as well be implemented in c#. I have the feeling that it was the suggestion of the whole post (by E Lippert).

  • Anonymous
    March 27, 2012
    @AG:i think the point is that NULL and false become ambiguous. In C++ you have two states True and False (which could be NULL). Eric states 'In particular, we want to treat nullable bools as having three states: true, false and null'  and in C++ theres only two states for a bool. What erics last example is outlining are the disatvantages a of a full tristate bool system where any expresion involving null is always false. (Which I belive is similar (same?) to sql , which, as a beginer in sql, i sometimes forget and get burned by it).

  • Anonymous
    March 28, 2012
    The comment has been removed

  • Anonymous
    March 28, 2012
    if (!(!x)) makes for some fun double negatives... if x was null, would !(!x) be null or false? how would you tell that this is any different to !!x

  • Anonymous
    March 29, 2012
    The comment has been removed

  • Anonymous
    March 30, 2012
    @Ben Voight - but when Nullable was added, a specific decision to support if(p) for bool? p could have been made, as was done for Visual Basic. That is what he is explaining why was not done, rather than the general T? p case (which is rejected for the same reason as the general T p case) @Mike Caron - How about having T? be "Nullable<T> if T is a non-Nullable`1 value type, otherwise T". And suitable CLR magic for generics.

  • Anonymous
    March 30, 2012
    @Mike Caron "int?? myInt = null; Console.WriteLine(myInt == null); //???" SomeStruct? maps to Nullable<SomeStruct>, which is also a struct.  Therefore, logically, int?? would map to Nullable<Nullable<SomeStruct>>.  Nullable<T> has an implicit conversion from null.  In this conversion it creates a nullable object and set's HasValue to null and leaves the Value as the default.  This means that myInt will be an instance of Nullable with HasValue false.  Value will be a default nullable, and the default is for hasValue to be false and for Value to be the default.  Value is an int, so it will be 0. Comparing it to null is essentially the same as myInt.HasValue.  That will be false, so that's what it would output. That said, the whole point is moot because the C# lexer doesn't consider ?? a valid suffix to a variable name declaration. Nullable<Nullable<int>> (explicity written out) also says that Nullable<int> isn't non-nullable, even though it's a struct.  I would assume that this specially case is explicitly checked for.


Anyways...it seems that the underlying issue here is not so much treating null as false so much as an issue of treating ints as bools, in my opinion.  A Pointer is just an int (in all practical implementations) and a null pointer just happens to be 0 (in all practical implementations).  If an int is convertible to a boolean and a pointer is convert able to int then null is convertible to false through int, not through some magic null-to-boolean conversion.  This frame of mind also removes the inconsistencies with things like if(!null).  null is 0, and NOT 0 is 1, so if(!null) is true. Now having said that, I don't think that ints should be treated as bools like they are in C/C++.  Anytime programmers rely on this fact it almost always leads to confusing code, and confusing code leads to bugs, all because you're too lazy to add an ==0 onto the end of your if statement (which of course converts any int into a boolean using the appropriate logic).

  • Anonymous
    March 30, 2012
    @servy42: There are actually reasonable implementations of C and C++ where a null-pointer is not all bits zero and even some where not all pointers are of equal size. And converting to boolean is consistently a literal !=0 check, which due to how nullpointer-constants are written compares with a nullpointer-constant if you feed in a pointer, no integers in sight, really. I don't think there is a way in C# without unsafe code to find out about the implementation of nullpointers, nor is there in the standard any like guarantee. Might be wrong there. BTW: As long as there are no nullables, there's no issue. Thus no problem for C/C++/thelikes. And about converting from pointer to int: In unmanaged code/languages you can force the compiler to do whatever you say, so what? That's your lookout as programmer.

  • Anonymous
    March 31, 2012
    Chris B. wrote:    var numbers = new int[0];    bool all = numbers.All(i => false); This is not confusing at all. This is logically the same as:    var numbers = new int[0];    bool all = !numbers.Any(i => true); Are there any numbers that match this predicate? No, because there are no numbers at all. So “Any()” must return false, and thus, “All()” must return true, otherwise empty collections would present a confusing (and contradictory) special case.

  • Anonymous
    March 31, 2012
    The comment has been removed

  • Anonymous
    March 31, 2012
    @Timwi That's a general MS blog bug, which has existed for.. a long, long time*. Eric can't do much more about it than we. *generally caused by taking too long to post the message. General procedure is to copy your post before hitting "post" ;)

  • Anonymous
    April 02, 2012
    @Deduplicator: ECMA-372 (C++/CLI Language Specification) §12.3.3 stipulates: "The representation of a handle with value nullptr shall be all-bits-zero."  I think this requirement would make it difficult to use a nonzero representation for C# null references, on CLI implementations that support C++/CLI.

  • Anonymous
    April 03, 2012
    Since you brought up Nullables, I would like to ask why they are implemented as a special struct, instead of the much simpler approach of using boxed structures. I understand that C++/CLI allows you to access members of a boxed struct. It seems to me that it would have been much simpler if "int?" simply meant "boxed int, possibly null". It would have made the compiler simpler, it would have made the type system more harmonious (fewer special cases), and it would have allowed generic methods like the following (with no constraints): void f<T>(T? x) { ... } // accepts any reference type, including boxed structs. But, on the topic of this post, I find the arguments entirely unpersuasive. If the question is whether pointers should be implicitly convertible to boolean, well, maybe the answer is no. But I don't see how "if (p)" is a source of bugs, and allowing this form does not actually require that pointers are convertible to boolean (although that is the simplest approach). Allowing if (!p) does imply the existence of an operator! that returns bool (but did you consider the alternative--the "unless"/"if not" and "until" statements?). For me the arguments against "if (p)" are clearly outweighed by the single argument in favor: it saves time. I have written "!= null" about 1300 times in my current solution, with 1000 of those cases in "if" statements. I have written "== null" 800 times. I'm just plain tired of typing it.

  • Anonymous
    April 03, 2012
    The comment has been removed

  • Anonymous
    April 04, 2012
    The comment has been removed

  • Anonymous
    April 05, 2012
    @ Kalle Olavi Niemitalo: Thanks for the quote. Actually, I see one and only one good reason for not providing a standard-conversion from (null, nonnull) to (false, true), for everything but nullable<bool>. It's a bit curious that nobody mentioned that you can create custom conversions in C#. Because C# refers to all objects by managed pointer, there's no way to decide weather you wanted to convert the pointer or the referenced object. C/C++/others don't have that problem, because either they don't support custom conversions using standard syntax or the provide references and pointers with different syntax, making that trivial.

  • Anonymous
    April 19, 2012
    If null means Unknown then why is it legal to use the equality operator to check for null? If a reference is an Unknown value how can you check if that is equal to another Unknown value? Shouldn't the answer to such a comparison be Unknown?