Udostępnij za pośrednictwem


Should C# warn on null dereference?

As you probably know, the C# compiler does flow analysis on constants for the purposes of finding unreachable code. In this method the statement with the calls is known to be unreachable, and the compiler warns about it.

const object x = null;
void Foo()
{
  if (x != null)
  {
    Console.WriteLine(x.GetHashCode());
  }
}

Now suppose we removed the "if" statement and just had the call.

const object x = null;
void Foo()
{
  Console.WriteLine(x.GetHashCode());
}

The compiler does not warn you that you're dereferencing null! The question is, as usual, why not?

The answer is, as usual, that we do not have to provide a justification for not doing a feature. Features cost money, time and effort, and take away money, time and effort from features that would benefit the user better, so features have to be justified based on a cost-vs-benefits analysis. So let's think about that.

Something I find helpful when thinking about a particular feature is to ask "is this a specific case of a more general feature? " The proposed feature is essentially to detect that a particular exception is always going to be thrown. Do we want to in general be able to detect cases where an exception will always be thrown and warn about them? Well, doing so with certainty is equivalent to solving the Halting Problem, as we've already discussed. But even without that, we do not want to give a warning every time we know that an exception must be thrown; that would then make this program fragment produce a warning:

int M()
{
  throw new NotImplementedException();
}

The whole point of throwing that exception in the first place is to make it clear that this part of the program doesn't work; giving a warning saying that it doesn't work is counterproductive; warnings should tell you things you don't already know.

So let's just think about the feature of detecting when a constant null is dereferenced. How often does that happen, anyway? I occasionally make null constants; perhaps because I want to be able to say things like "if (symbol == InvalidSymbol)" where InvalidSymbol is the constant null; it makes the code read more like English. And maybe someone could accidentally say "InvalidSymbol.Name" and the compiler could warn them that they are dereferencing null.

We've been here before; in fact, I made a list. So let's go down it.

The feature is reasonable; the code seems plausible, it is clearly wrong, and we could detect that without too much expense. However, the scope of this warning is very small; the vast majority of the time I've used null constants, I've used them for comparison, not for accessing members off of them. And the problem will be detected when we test the code, every time.

Could we perhaps then generalize the feature in a different way? Perhaps instead of constants we should detect any time that the compiler can reasonably know that any dereference is probably a null dereference. Solving the problem perfectly is, again, equivalent to solving the Halting Problem, which we know is impossible, but we can use some clever heuristics to do a good job.

In fact the C# compiler already has some of those heuristics; it uses them in its nullable arithmetic optimizer. If we can statically know that a nullable expression is always null or never null then we can do a better job of codegen. (And how we do so would be a great future blog topic.) However, the existing heuristics are extremely "local"; they do not, for example, notice that a local variable was assigned null and then later used before it was reassigned. So again, the scope of this warning would be very small, probably too small.

To do a good job of the proposed more general feature, we'd want to modify the existing flow analyzer that determines if a local variable is definitely assigned before it is used, with one that also can tell you whether the value assigned was non-null before a dereference. That's a much more expensive feature; the benefits are higher, but so are the costs.

What it really comes down to me for this feature is that last item on my list. Yes, it is always better to catch a bug at compile time, but that said, null dereference bugs that the compiler could have told you about with certainty are the easiest ones to catch at runtime because they always happen the moment you test the code. So that's some points against the feature.

So basically the feature request here is to write a somewhat expensive detector that detects at compile time a somewhat unlikely condition that will always be immediately found the first time the code is run anyways. It is therefore not a great candidate for spending budget on to implement it; thus the feature has never been implemented. It's a lovely feature and I'd be happy to have it in the compiler, but it's just not big enough bang for the design, development and testing buck, and we have many higher priorities.

Now, you might note that tools like the Code Contracts feature that shipped with version 4, or Resharper, or other similar tools, all have various abilities to statically detect possible or guaranteed null dereferences. Which proves that it is possible to do a good job of this feature, and that's good to know. But that is also points against doing the warning in the compiler. As I point out on my list, if an existing analysis tool does a good job, then why replicate that work in the compiler? The compiler does not aim to be the be-all and end-all of code analysis; in fact, we are building Roslyn precisely to make it easier for third parties to develop such analysis tools. We can't possibly do every great code analysis feature, but we can make it easier for you to do so!

Comments

  • Anonymous
    July 17, 2012
    Oh how I wish Code Contracts was actually included with .NET 4, but alas, only the declarative method and attribute stubs are included, which by themselves are useless (or worse than useless, they'll throw exceptions, sometimes descriptive and sometimes an ExecutionEngineException). I hope that the compilers included in vNext (.NET 5.0?) will include Code Contracts IL weaving functionality, or perhaps even a general IL weaving extensibility point for both Code Contracts and PostSharp & Friends (a problem Roslyn doesn't, nor purports, to solve). Well, at least until better metaprogramming support is added to the language which would make those kind tools obsolete. And I wholeheartedly agree that the compiler should not strive to be the ultimate static code analyzer. It should emit warnings about things it has already figured during the code analysis it does as part of its main job - compiling! If it's not directly related to the job at hand, it shouldn't go out of its way to perform extra analysis. Sure, it is already intimately familiar with all the language's intricacies, but it's doesn't mean it has the means nor the mandate to  analyze that added complexity, make the 'fuzzy' decisions about whether it's 'correct' or not, and effectively convey it to the user (whilst taking into account his preferred warning verbosity level and coding style). Static analysis is much better off with a dedicated tool such as StyleCop/FxCop, ReSharper, CC, etc. If they can all share a single 'language layer' in the form of Roslyn, all the better - those who write these tools will be more productive and will have more time left  to write better static code analyzers! (Is this a double post? If so, it's the blog system's fault.)

  • Anonymous
    July 17, 2012
    This reminds me of an answer you gave me on Stackoverflow : stackoverflow.com/.../1796 Any change on that?

  • Anonymous
    July 17, 2012
    The comment has been removed

  • Anonymous
    July 17, 2012
    >The answer is, as usual, that we do not have to provide a justification for not doing a feature. Features cost money, time and effort, and take away money I use Visual Studio daily in my job and occasionally in my hobbies, but the number of features in Eclipse a free IDE but not in an IDE that costs $1,200 per license is astounding to me.  Features cost money, but lack of features and hesitation to provide functionality has an opportunity cost as well.

  • Anonymous
    July 17, 2012
    @Colton E. Visual Studio and the C# Compiler are two entirely separate product.  One simply happens to use the other. Visual Stuido (other than the express version, which is free) does cost money, but C#, and it's associated compilers are entirely free.  You're complaining to the wrong people if you don't like some aspect of the IDE that happens to utilize a language developed (in part) by the writer of this blog.

  • Anonymous
    July 17, 2012
    As Allon Said: JetBrains' Resharper tool will identify every place in your code that has the possibility of a null reference error, allowing you to put in a null check.

  • Anonymous
    July 17, 2012
    I agree, that kind of warning is not going to be very useful (in the grander scheme of things). I'd much rather have a type system that can express non-null references. Or alternatively an option type (à la F#, Haskell, Scala) to force the clients of my code to handle the case where the value is missing. In the meantime, I use ReSharper's Null and and NotNull annotations to help catch illegal uses of APIs at compile-time.

  • Anonymous
    July 17, 2012
    @Allon Guralnek: Glad I'm not the only one whose first thought on reading this was Code Contracts. :)  Those tools are invaluable for my work, but they're oh-so-slow and limited.  A full-fledged CC (maybe built on Roslyn?) is something that I would really pay good money for, it's that important to me.

  • Anonymous
    July 17, 2012
    @SealedSun.ch: I believe I saw a MS Research paper about possibility of including non-nullable references in C#. It dates back to the C# 1.0 days (even before the generics!), and I guess the fact that it's still not implemented says something.

  • Anonymous
    July 17, 2012
    Would this make it even harder to write a recursive anonymous method? I've always found it slightly irritating that this:    Func<int, int> fib = i => i < 2 ? 1 : fib(i - 1) + fib(i - 2); seems to fall foul of the definite assignment rules. I'm assuming the argument for why it fails is that fib is not assigned until this statement has finished executing, so you're not allowed to use fib inside the statement, and the code in the delegate is counted as being "inside the statement" even though in this particular case it cannot be run until after the statement completes. (I know there are scenarios where you could create and invoke a lambda in the statements, so I'm guessing that's why this example is not allowed.) My usual workaround looks like this:    Func<int, int> fib = null;    fib = i => i < 2 ? 1 : fib(i - 1) + fib(i - 2); But the definitely-not-null analysis you're describing seems like it would also reject this. (R# already complains about this, but at least it's not a warning.) So that seems like another point against. Or is there a better way to deal with this problem? Am I using the wrong workaround here?

  • Anonymous
    July 17, 2012
    What you should ideally do is change the language so that T var cannot be null, and you need to say T? var for it to be nullable. Unfortunately, that ship has sailed, although maybe you can introduce T! var for a non-nullable reference.

  • Anonymous
    July 17, 2012
    Resharper is a bloatware. I can't use it without a crash. Please integrate these kinds of goodies as much as possible!

  • Anonymous
    July 17, 2012
    It appears my comment got eaten. I use Resharper too and it is great, but there are a few places it gets this wrong and warns about a constant null expression which is in fact not always null. I hate having to supress warnings on things that aren't really wrong (or fiddle with the code for no reason other than making the message go away), so I would not want to see a warning for this.

  • Anonymous
    July 17, 2012
    @Bob: Resharper isn't bloatware at all - it's pretty quick and, in my experience, stable. But let's humour you, and continue with your request. If the compiler should warn about null dereferencing, should it also warn about 'throw ex;' statements too? What about the code namespace not matching the file location? Closure over a modified variable? Unused 'using' statements at the top of a code file? I could go on. If all this is added to the compiler, then compiler performance suffers drastically, and the language stagnates, as there's no time to develop new features any more. The compiler has one job, and one job alone: take source code and produce object code. It has no business enforcing styles or conventions - that's the purpose of tools like Resharper.

  • Anonymous
    July 18, 2012
    @IanG: If you want to write recursive anonymous methods, you should definitely check out blogs.msdn.com/.../anonymous-recursion-in-c.aspx – you might be a bit surprised the situation is not that simple. :-)

  • Anonymous
    July 18, 2012
    The comment has been removed

  • Anonymous
    July 18, 2012
    Joker, it is another subject, but we can always try to fix errors from the past, no? qconlondon.com/.../Null+References:+The+Billion+Dollar+Mistake

  • Anonymous
    July 18, 2012
    @Petr Kadlec - I probably shouldn't have said anything about recursion. It was just a short way of talking about referring to the delegate from within the delegate. The situation in which I have most often hit this is in fact when writing an event handler that unsubscribes itself. The Y combinator (which I was already familiar with, but thanks for the link) is definitely overcomplicating things. So, what I should have said is I'd like to be able to write this: EventHandler h = (s,e) => {  src.SomeEvent -= h;  // Not allowed, boo hiss!  ... do stuff }; src.SomeEvent += h; Well, I say 'like'. Really, I'd prefer to use something more suitable like a Task but when events are all that's offered, this is where you can end up.

  • Anonymous
    July 18, 2012
    I'm afraid that nulls are actually beneficial enough that were they not to exist, they would be invented. After all, we already had non-nullable types in C# just long enough for the designers to realize that they had to make nullable versions of them. Before nullable value types were added to C# there were ways to make values nullable, but they were all ad-hoc: databases had DbNull, XML had *Specified, and others used default(T). With all the ad-hoc methods, there was no way for the compiler to give warnings or have lifted operations and the runtime couldn't raise exceptions when you accessed a roll-your-own null. As for nullable references, I don't see how a language that needs to interoperate with the outside world in a natural manner can do without them. If you designed a language with non-nullable references (like F#), you'd still have to have the concept of null because you could get one from (or have to pass one to) a module written in VB.

  • Anonymous
    July 18, 2012
    @Gabe: The main reason for creating Nullable<T> was better DBMS support (yeah, LINQ). And there are well-renowned relational theory experts who believe that introducing NULL into the relational data model was a grave mistake—besides, the relational theory had been doing pretty well without NULL for 10 years (1969-1979), and relations DBMS built during that period didn't have any NULL concept. The main motivation for NULL's introducing was solving the Missing Information Problem, and it failed to solve this problem in a satisfactory way: new very strong problems have appeared, and the MIP itself wasn't quite resolved. And since there are relational DBMS that has no NULL support, which means I can conjugate my C# program with them without using Nullable<T> and all this "x ?? x : default(X)" stuff. Isn't it great, y/n? Back on topic: many programmers expect that null dereferencing is checked at run-time, not compile-time anyway. And if you want reason about nullability of your variables, there are already plenty of tools for it. Still, I would like to see non-nullable references, but since the CLR is full of null-returning methods... such references won't be as useful as they could be.

  • Anonymous
    July 18, 2012
    I think the real question is should we make the compiler so intuitive and complex that it takes away from the other more simple features that will contribute to effective programming. At this point we may not need it but going forward  C# compiler can check for null type references to reduce the "obvious" steps in the code ( similar to closing connections or finally clause after try/catch )

  • Anonymous
    July 19, 2012
    Also on the topic of nullable references... People that have played with Spec# for instance know them. Non-nullable references, with a "grace period" for initialization, are a great thing to have. Oh, talking of Spec# and Code Contracts, people interested in those may also like the thesis recently published on that topic: e-collection.library.ethz.ch/.../eth-5609-01.pdf

  • Anonymous
    July 19, 2012
    The comment has been removed

  • Anonymous
    July 19, 2012
    The compiler does not need to warn on dereferencing nulls. But a warning on all uppercase menu captions would be nice.

  • Anonymous
    July 19, 2012
    Is it possible to - atleast display -what object is null in the expression- as part of the exception message - when null reference exception thrown?

  • Anonymous
    July 19, 2012
    @Douglas: 'throw ex;' shouldn't throw a warning - it'll never fail at runtime. Besides, there are situations where you do want to do this, for instance in a library where you don't want to expose the internals to a third party. This makes it a question of style, not a question of correctness.

  • Anonymous
    July 20, 2012
    I would argue that if you're hiding your internals then you do not want to throw the originating exception, but a different exception entirely.

  • Anonymous
    July 21, 2012
    Joker: So 1979 must have been the year that they finally decided 9/9/99 wasn't going to be feasible very much longer for encoding "unknown date" and that it would be necessary to implement an out-of-band way to store the fact the data was unknown. While nulls may cause problems, they solve far more problems than they cause. In other words, while non-nullable references might be nice to have, nullable value types are far more valuable. In particular, without nullable objects, the compiler wouldn't even have a chance to warn you about a null dereference.

  • Anonymous
    July 22, 2012
    Hi there, Is there a way I can get a hold of you to use some of the contents on your blog. I work for an IT recruitment company and we have a niche newsletter aimed at C# developers and find that the contents on your blog might appeal to them. My email address is mmotsepe@insource.co.za or you can visit our website which is www.insource.co.za

  • Anonymous
    July 22, 2012
    @@RaceProUK: Two parts to my reply. Part 1: Counter argument - InvalidOperationException, among others. You may want to catch and log/report this, then rethrow it without exposing your internals. Part 2: What's your real fake name? :)

  • Anonymous
    July 23, 2012
    <troll>Did you just explain why compilers should not do typechecking?</troll>

  • Anonymous
    July 23, 2012
    Wouldn't such a feature be terribly wrong? It's a programmer's job to avoid making calls to something that either always is null or might be null. ReSharper can cover for that, but I still think it's a programmer's duty not to let NullReference exception happens, especially not ones that you can see (or in this case, cause) so evidently.

  • Anonymous
    July 23, 2012
    Yes, it's a programmers job to design and implement flawless programs solving the real(tm) problem without fuss and bother. But that's no reason to eschew tools making the programmers lot bearable, like compilers, testsuites and the like. If it was, we would still always use assembler, sorry machinecode. We are only trying to determine how far the compiler can and should help, without detracting from its main job or making other things more difficult.

  • Anonymous
    July 27, 2012
    Hey Eric, just found your blog yuesterday and find it quite interesting. Thanks for posting. Well, there are two answers, one in ideal-land where time and money aren't a factor, and the other in the really-real world. In ideal-land the theme here seems to be "at what point do you stop"? For example: should you warn on unused using statements.  My thoughts are you (Eric or whoever implements this) do not stop and it becomes up to the developer. The developer sets the level (or option) of what they want to know about. What about Error Levels? Couldn't that be factored in to allow the developer to choose? In really-real land it's hard to say (hence the blog post). I usually like to implement near features together, helps for a multitued of reasons, most notably because my mind is already involved in whatever the feature is. Personally, I would probably wait until something similar comes along that allows most of the plumbing to be put in place. Convince the higher-ups to do it then, stating now is the time because the addition to design/development/testing time is minimal then and there, but wont be in the future. Ack, I just gave my secret away! Regards, Robert

  • Anonymous
    August 03, 2012
    Much more static analysis is needed in the compiler.