Udostępnij za pośrednictwem


Null Is Not Empty

Back when I started this blog in 2003, one of the first topics I posted on was the difference between Null, Empty and Nothing in VBScript. An excerpt:

Suppose you have a database of sales reports, and you ask the database "what was the total of all sales in August?" but one of the sales staff has not reported their sales for August yet. What's the correct answer? You could design the database to ignore the fact that data is missing and give the sum of the known sales, but that would be answering a different question. The question was not "what was the total of all known sales in August, excluding any missing data?" The question was "what was the total of all sales in August?" The answer to that question is "I don't know -- there is data missing", so the database returns Null.

This principle underlies the design of nullable value types in C#. The reason that we have nullable value types at all is because there is a semantic difference between the null integer/decimal/double/whatever and the zeroes of those types. A zero means “I know that the quantity is zero”, a null means “I don’t know what the quantity is”.

This also explains why nulls propagate; if you add two nullable ints and one of them is null then the answer is null. Clearly ten plus “I don’t know” equals “I don’t know”, not ten.

The concept of “null as missing information” also applies to reference types, which are of course always nullable. I am occasionally asked why C# does not simply treat null references passed to “foreach” as empty collections, or treat null strings as empty strings (*). It’s for the same reason as why we don’t treat null integers as zeroes. There is a semantic difference between “the collection of results is known to be empty” and “the collection of results could not even be determined in the first place”, and we want to allow you to preserve that distinction, not blur the line between them. By treating null as empty, we would diminish the value of being able to strongly distinguish between a missing or invalid collection and and present, valid, empty collection.

Now, if for some odd reason you do wish to treat null collections the same as empty collections, that’s easy enough to do. You can simply use the null coalescing operator; that’s what it’s for:

foreach(Customer customer in customers ?? Enumerable.Empty<Customer>())

The ?? operator means “use the left hand side, unless if the left hand side is null, use the right hand side.” Handy, that.

**************

(*) C# does treat null strings as empty strings when concatenating them. See the comments for a discussion of this fact.

Comments

  • Anonymous
    May 14, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    This is interesting. Years ago, a product call eMbedded Visual Basic, which used VBScript as its engine, had a constant of vbNullPtr that was used for API calls. Any idea what this value would pass?

  • Anonymous
    May 14, 2009
    What's the advantage of  your special empty sequence versus Linq's Enumerable.Empty<T>()? Good point. There is no advantage. It makes more sense to just use the standard one. I've updated the text. -- Eric  

  • Anonymous
    May 14, 2009
    @ghenne : it would probably be equivalent to IntPtr.Zero...

  • Anonymous
    May 14, 2009
    Aww, a nostalgic post for me - I'm the same Blake from the comment thread on the original VBScript post.   Over five years later, and this has consistently been one of my favorite Microsoft blogs.   Thanks for all the great articles, Eric. You're welcome, thanks for reading! -- Eric

  • Anonymous
    May 14, 2009
    Great stuff Eric, I don't think people treat nulls as valuable information often enough. A few more thoughts: http://clipperhouse.com/blog/post/Nulls-and-knowledge.aspx

  • Anonymous
    May 14, 2009
    Thanks for another interesting post - I heartily agree that these are important distinctions for programmers to make. I also noticed Blake's comment, and went back through some of the discussion between you two from the original post. Time permitting, I would love to see a post or two about your ideas on interviewing, since I'm currently learning how to give interviews myself. How do you attempt to test problem solving ability? As I'm sure you've found, it seems a lot harder than just testing knowledge.   I've written two articles about interviewing. See the "interviewing" archive button on the sidebar. -- Eric

  • Anonymous
    May 14, 2009
    I often see this manifested (or not manifested correctly) in data capture scenarios. It's all well and good to have strict validation, but sometimes your hapless user just doesn't know what the chassis serial number is, etc. Overzealous developers who shun nulls in the database end up at some point creating sentinel values which for obvious reasons doesn't make anything easier in the long run. Not only do you have a non-standard syntax, but you better be sure your sentinel is really never going to happen and doesn't ruin any computations in the process, Null is that special value that is outside the set of all permissible values, and I think sometimes people just think it's only the runtime scolding you from using an uninitialized reference. FYI To all the people that hate checking for null, you can always use the Null Object Pattern if it makes writing your domain code easier.

  • Anonymous
    May 14, 2009
    I'm pretty new to C#, but this has been nagging me for a bit why isn't int nullable in C#?   Nullable ints are nullable in C#. Non-nullable ints are not nullable. This seems like a sensible approach, no? -- Eric why does it default to 0? Well, what value would you prefer a non-nullable int to default to? -- Eric int thisIsAnInt = null; throws an error on build The syntax for nullable value types in C# is to put a question mark after the type. Try "int? x = null;" -- Eric

  • Anonymous
    May 14, 2009
    (replying to myself) Actually it looks like you have already posted some other stuff specifically about interviewing, which was very interesting. Thanks again for the great blog.

  • Anonymous
    May 14, 2009
    Thomas, you can do: int? thisIsAnInt = null; which is equivalent to Nullable<int> thisIsAnInt = null;

  • Anonymous
    May 14, 2009
    @Thomas int is a ValueType and null does not apply to ValueTypes. Eric recently wrote an article on ValueTypes and referrenced types. That article might be worth a little of your type. (Not that it actually answers your question but it's deducing from your questing I believe it holds valueable information for you).

  • Anonymous
    May 14, 2009
    I love the null coalescing operator. It's great for lazy initialisation: private List<Order> orders; public List<Order> Orders {  get { return orders ?? (orders = LoadOrders()); } } I've got a post on my blog about making that thread-safe: http://blog.markrendle.net/post/Lazy-initialization-thread-safety-and-my-favourite-operator.aspx

  • Anonymous
    May 14, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    I do not agree that a string is a reference type. Strings are values just as integers are. It just happens that the .NET architecture implements them as referenced objects. That's why you had to add the "IsNullOrEmpty" kludge. A proper string can never be null, in the same was as an integer can never be null. Unless I explicitly want it, in which case I would declare it as  "string? s", in the same way as I declare a nullable integer. Ordinary collections are not values (as they are mutable objects) but immutable collections definitely are values and should be non-nullable by default. I agree that immutable types are logically values, and it would have been nice to represent that in the type system. I also agree that it would have been nice to build in nullability/non-nullability from day one, instead of starting with non-nullable value types and nullable reference types, then adding nullable value types, and then never adding the fourth. The next time you design a brand-new type system, keep that in mind. But as a practical matter, I'm afraid strings are reference types, and that there are good reasons for that. The pleasant fact that value types are of known size, and need not be garbage collected makes it difficult to make strings value types. Also, the fact that strings can be cheaply copied by reference instead of copying all their bits, as we do with value types, is a big perf win. Would you rather abandon these benefits in exchange for making strings value types? What's the compelling benefit of making strings into value types that pays for the massive loss of performance that would entail? -- Eric  

  • Anonymous
    May 14, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    The comment has been removed

  • Anonymous
    May 14, 2009
    oops, final statements should have been stmt 1:  return this.orders ?? ( this.orders = Interlocked.CompareExchange(ref this.orders, new OrderCollection(), null) ); stmt 2:  return this.orders ?? ( this.orders = new OrderCollection() );

  • Anonymous
    May 15, 2009
    C#/VS2010 Null is Not Empty VS2010: On Triangles and Performance - It sure looks like the very soon Beta 1 will exhibit some great work on Outlining and Performance Parallel Tasks - new Visual Studio 2010 debugger window ASP.NET Tip #61: Did you know...How

  • Anonymous
    May 15, 2009
    Interesting Finds: May 15, 2009

  • Anonymous
    May 15, 2009
    This is a bit unrelated to the topic, but I started with this (to see your topic "in action") String a = null;
    var b = a + null;
    Console.WriteLine(b.Length); Then I tried using different objects, like: Form f1 = new Form();
    var f2 = f1 + null;
    Console.WriteLine(f2.Length); I was expecting compilation errors ("adding" null to a Form? "Length" of a Form?), but instead it compiles and runs just fine. The output is: System.Windows.Forms.Form, Text: So, it turns out that in "var f2 = f1 + null;" var becomes a string, and calls ToString() on f1 to concat (my guess). Is it so? And if yes, why? Why am I able to add a Form to a null, and get a String? I'm probably missing something in how "var" works... Though I applaud your experimental approach, rather than guessing at the semantics you might consider reading the spec, which states: The binary + operator performs string concatenation when one or both operands are of type string. If an operand of string concatenation is null, an empty string is substituted. Otherwise, any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object. If ToString returns null, an empty string is substituted.  Now, this bit is not perfectly accurate. Clearly in your "form" case neither operand is of type string. This bit really should say "when one or both operands can be implicitly converted to string and operator overload resolution chooses one of the built-in string concatenation operators". As I noted in Mike's comment above, I had momentarily forgotten about this unfortunate fact about string concatenation. This is not how I would have done things, but this choice was imposed upon the language by the implementation of String.Concat. It would be awfully weird to have a language where + did one thing and String.Concat did another. -- Eric  

  • Anonymous
    May 15, 2009
    Is there a vb.net equivalent for ... foreach(Customer customer in customers ?? Enumerable.Empty<Customer>()) ? S

  • Anonymous
    May 15, 2009
    I have a question about the construct:  foreach(Customer customer in customers ?? Enumerable.Empty<Customer>()) I haven't used the ?? operator before but the above syntax looks a little klunky to me. Take this specific example:  int[] data = new int[] { 1, 2 };  foreach ( int i in data ?? Enumerable.Empty<int>() ) {} in the above case if I just include System and not an System.Linq I get a compiler error. Why should I have to include Linq specific code for this? The foreach is "Linq independent" so it seems to me that maybe there should be a new keyword or contextual keyword to make this a little cleaner? We have foreach. We have default. Maybe DefaultEmptyCollection<int> which uses an array :) The "Enumerable" class is in the System.Linq namespace. That's where all the rest of the LINQ sequence operators are, so it's a sensible place. -- Eric Paul.

  • Anonymous
    May 15, 2009
    "The pleasant fact that value types are of known size, and need not be garbage collected makes it difficult to make strings value types. Also, the fact that strings can be cheaply copied by reference instead of copying all their bits, as we do with value types, is a big perf win." Eric, I don't agree with you. It's pretty easy to make a string a value type (but the value must internally always contain a reference to a char array). The value would then always have the same size as the size of a reference (32 bits on x86) and copying is safe and just as fast as copying an integer or a reference. I believe the real reason not to implement it as value type is because this would lead to large amounts of boxing, especially in CLR 1.0 applications, where there was no generics. Well, sure, I suppose. But I don't understand the point. I mean, we could cut out all the character array rigamarole and just say that struct MyString { public String theRealString } is a "value typed string". What does that buy us? Taking a storage location which can contain a 32 bit managed reference to a string and reinterpreting it as storage of MyString doesn't change anything germane, it just makes it harder to take advantage of the underlying ref type. We can take any reference type and explicitly wrap a value type around the reference; that just makes it slightly harder to compare things by reference. It doesn't change the fundamental fact that the data storage is ultimately implemented using reference semantics. Essentially what this example highlights is that references are themselves values. References are already treated as value types; the interesting thing about them is that they refer to something, not that they're copied around by value. Making string, or any type, a "shallow" value type is trivial -- so trivial that it's not very interesting. Such a beast still has the fundamental property of reference types: that it refers to something else. Making it deeply a value type, the way, say, int is, so that it refers to nothing, that's a what I meant by it being a lot more difficult. -- Eric   Just for fun, here is a non-nullable string implementation (named 'vstring') as value type :-) public struct vstring : IComparable<vstring>, IEnumerable<char>, IEnumerable, IEquatable<vstring> {    private readonly string value;    public vstring(string value)    {        this.value = value;    }    public vstring(char[] value)    {        this.value = new string(value);    }    // Never returns null.    public override string ToString()    {        return value ?? string.Empty;    }    public override int GetHashCode()    {        return this.ToString().GetHashCode();    }    public int CompareTo(vstring other)    {        return this.ToString().CompareTo(other.value ?? string.Empty);    }    public IEnumerator<char> GetEnumerator()    {        return this.ToString().GetEnumerator();    }    IEnumerator IEnumerable.GetEnumerator()    {        return this.GetEnumerator();    }    public bool Equals(vstring other)    {        return this.ToString().Equals(other.value ?? string.Empty);    }    public vstring ToLower()    {        return new vstring(this.ToString().ToLower());    }    public vstring ToUpper()    {        return new vstring(this.ToString().ToUpper());    }    public static bool Equals(vstring a, vstring b)    {        return a.ToString() == b.ToString();    }    public static bool operator ==(vstring a, vstring b)    {        return Equals(a, b);    }    public static bool operator !=(vstring a, vstring b)    {        return !Equals(a, b);    }    public static vstring operator +(vstring a, vstring b)    {        return new vstring(a.value + b.value);    } }

  • Anonymous
    May 15, 2009
    >> Would you rather abandon these benefits in exchange for making strings value types? Eric, Don't get me wrong. I'm not saying that they shoud be implemented internally as value types (ie being allocated on stack, copied bit by bit every time the wind blows, etc.) but they should appear to the programmer as value types (ie never be null, and instead initialized by default to the empty string, etc...) That's one of the too rare things Borland Delphi does right :-) Basically that's almost just syntactic sugar : every time you see a string declaration, initialize it to String.Empty instead of null, and catch every assignation of null to a string (including return statements.)

  • Anonymous
    May 16, 2009
    >> Well, sure, I suppose. But I don't understand the point. There is no point in doing that and it won't buy us anything, expect that programmers would see strings as value types, which is the point Stephan Leclercq tried to make. While I understand Stephan’s point, I’m against doing this. While strings then would really represent a logical value, this might give developers the wrong impression that strings would be copied completely by value, which could never be the case, because this -as you said- would be disastrous for performance. So I didn’t disagree on string being a reference type, I only disagreed on the arguments you gave against string not being a value type. ps. I saw my code example 'exploded' on your blog. It now takes a lot of space, sorry for that.

  • Anonymous
    May 17, 2009
    >> But as a practical matter, I'm afraid strings are reference types, and that there are good reasons for that. The pleasant fact that value types are of known size, and need not be garbage collected makes it difficult to make strings value types. Also, the fact that strings can be cheaply copied by reference instead of copying all their bits, as we do with value types, is a big perf win. This does not preclude from String being a value type. It just has to be a value type that encapsulates a single reference to an internal "StringData" reference type, with the latter working exactly as System.String works today. So user sees a value type, with no null value, and the implementation still gets all the benefits of a reference types. >> While strings then would really represent a logical value, this might give developers the wrong impression that strings would be copied completely by value It wouldn't matter in the slightest. Since strings are immutable, there are no observable effects between a copying implementation, and a sharing implementation (well, except for Object.ReferenceEquals, but why would you care about that one?). So it doesn't really matter what impression developers get - it will be consistent with behavior either way.

  • Anonymous
    May 18, 2009
    "The designers of String.Concat chose to treat null concatenation as empty string concatenation. Which means that (string)null + (string)null gives you an empty string in C#, bizarrely enough" MS SQL has the same behavior if you SET CONCAT_NULL_YIELDS_NULL OFF  which happened to have burned me last week because two different apps set it differently.

  • Anonymous
    May 18, 2009
    "Explain the difference between Null, Empty, and Nothing" has been one my favorite interview questions for years. I've always enjoied the null/empty/nothing stares after I ask that question. Does the answer to question really tell you much about the candidate? If they've claimed to be a VB expert then that will certainly tell you whether they are or not I suppose. But I try to ask interview questions that allow the candidate to demonstrate skills, intelligence or passion rather than testing domain-specific knowledge. I assume that anyone who is smart, skilled and gets stuff done can learn the domain. -- Eric  

  • Anonymous
    May 18, 2009
    "The designers of String.Concat chose to treat null concatenation as empty string concatenation." Hopefully these designers have been re-assigned.  :^) Seriously, not a good design decision in my humble opinion. It only masks problems and adds to the newbie confusion between null and the empty string. I guess it's too late to turn back though.

  • Anonymous
    May 24, 2009
    Когда я начал этот блог в 2003 году, один из первых постингов был про отличия между Null, Empty и Nothing

  • Anonymous
    June 02, 2009
    Now that there's int?, I really want things like IEnumerable!, which would never be null, so I can stop getting forced to deal with "no information" in the places where it's really not appropriate. Then have the compiler force people to do something like this before calling me: IEnumerable<T>! seq = annoying ?? Enumerable.Empty<Customer>(); Microsoft Research makes a language called Spec# which has a non-nullable ref type system; you might want to check it out. Also, the next version of .NET will have a "contracts" system whereby you can annotate your code with contracts that describe nullability; unfortunately, we are not integrating it directly into the language at this time, but still, it gets you a lot of the way there. -- Eric

  • Anonymous
    July 07, 2010
    Hi! I return to comment this great post while Phil Haack has blogged about (not-so-great) null-checks recently... This "Special Case" -pattern was also introduced in the famous book: Fowler - P of EAA. In F# you can use the option type.