Jaa


C++ and the Pit Of Despair

Raymond has an interesting post today about two subtle aspects of C#: how order of evaluation in an expression is specified as strictly left-to-right, and how the rules regarding local shadowing ensure that an identifier has exactly one meaning in a local scope. He makes an educated guess that the reason for these sorts of rules is to "reduce the frequency of a category of subtle bugs".

I'd like to take this opportunity to both confirm that guess and to expand upon it a bit.

Eliminating Subtle Bugs 

You remember in The Princess Bride when Westley wakes up and finds himself locked in The Pit Of Despair with a hoarse albino and the sinister six-fingered man, Count Rugen? The principle idea of a pit of despair is twofold. First, that it is a pit, and therefore easy to fall into but difficult work to climb out of. And second, that it induces despair. Hence the name.

I often think of C++ as my own personal Pit of Despair Programming Language. Unmanaged C++ makes it so easy to fall into traps. Think buffer overruns, memory leaks, double frees, mismatch between allocator and deallocator, using freed memory, umpteen dozen ways to trash the stack or heap -- and those are just some of the memory issues. There are lots more "gotchas" in C++. C++ often throws you into the Pit of Despair and you have to climb your way up the Hill of Quality. (Not to be confused with scaling the Cliffs of Insanity. That's different.)

Now, as I've said before, the design of C# is not a subtractive process. It is not "C++ with the stupid parts taken out". But that said, it would be rather foolish of us to not look at what problems people have typically had with other languages and work to ensure that those exact same problems do not crop up for C# users. I would like C# to be a "Pit of Quality" language, a language where its rules encourage you to write correct code in the first place. You have to work quite hard to write a buffer overrun bug into a C# program, and that's on purpose.

I have never written a buffer overrun in C#. I have never written a bug where I accidentally shadowed a variable in another scope in C#. I have never used stack memory after the function returned in C#. I've done all those things in C++ multiple times, and it's not because I'm an idiot, it's because C++ makes it easy to do all those things accidentally and C# makes it very hard. Making it easy to do good stuff is obviously goodness; thinking about how to make it hard to do bad is actually more important.

Now, given that the design of C# is not subtractive, we have to consider the pros and cons of each decision. Is there any compelling user benefit in a deliberate failure to specify what order functions in an expression are evaluated? The only benefit I can think of is "not breaking some existing users of two existing incompatible implementations by declaring one of the implementations to be wrong", which is the situation that the C standardization committee frequently found itself in, I'm sure. When C# was a new language that wasn't an issue, so we were free to pin that down early. Doing so has compelling benefits; it prevents subtle bugs, and as I'll discuss in a moment, there are other benefits as well.

So, long story short, yes, designing the language so as to prevent certain categories of subtle bugs is a huge priority for us. However, there are other reasons too. For instance:

Uncertainty sometimes has high value but only in special contexts

Like Vroomfondel said, "we demand rigidly defined areas of doubt and uncertainty! " Ideally, those areas should be small and should afford some way for users to eliminate the uncertainty. Nondeterministic finalization is a good example. We deliberately do not specify when and in what order finalizers run because:

  1. the vast majority of the time it makes no difference,
  2. relying on a particular timing or ordering of finalization some small percentage of the time is probably a subtle bug waiting to happen,
  3. specifying it would require us to simplify the implementation to the point where it actually could be specified, thereby destroying much of its value; there is value in that complexity
  4. specifying it ties our hands to make algorithm improvements in the future

But we do provide a mechanism (the "using" statement) whereby if you do need to ensure that a finalizer runs at a particular point, there is an easy syntax for it.

With the possible exception of point 2, the order of evaluation of sub-expressions does not have these problems. It often does make a difference, the implementation is already simple enough that the specification is trivial, and its unlikely that we are going to come up with some incredible improvement in the algorithm which determines what order subexpressions are evaluated in. And if by some miracle we do, the specification does take care to call out that if we can prove that out-of-order evaluation cannot introduce subtle bugs, then we reserve the right to do it.

Uncertainty is hard on the language implementation team

When I am implementing part of the language specification, of course I want great freedom to decide how to implement a particular chunk of semantics. The language specification is not in the business of telling me whether we should be using a hand-written recursive descent parser or a parser-generator that consumes the grammar, etc. But I hate, hate, hate when I'm reading the specification and I'm left with a choice of what semantics to implement. I will choose wrong. I know this, because when I ask you guys what the "intuitively obvious" thing to do is, significant fractions of the population disagree!

Uncertainty is also hard on our testers -- they would like to know whether the language is correct, and being told "the specification is unclear, therefore whatever we do is correct" makes their jobs harder because that's a lie. Crashing horribly is probably not correct. Deadlocking is probably not correct. Somewhere on that spectrum between clearly correct behaviour and clearly incorrect behaviour is a line, and one of testing's jobs is to ensure that the developers produced an implementation that falls on the "correct" side of that line. Not knowing precisely where the line is creates unnecessary work for them, and they are overworked already.

And uncertainty is hard on our user education people. They want to be able to write documentation which clearly describes what semantics the implementation actually implements, and they want to be able to write sample programs to illustrate it.

Uncertainty erodes confidence 

And finally, uncertainty erodes confidence in our users that we have a quality product that was designed with deep thought, implemented with care, and behaves predictably and sensibly. People entrust the operation of multi-million dollar businesses to their C# code; they're not going to do that if we can't even reliably tell them what "(++i) + i + (i++)" does!

*****

Next time on FAIC I think I'll talk a bit about a related principle of language design: how we think about issues involving "breaking changes". I'll be expanding upon an excellent article written a couple of years ago by my colleague Neal, so you might want to start there.

Comments

  • Anonymous
    August 14, 2007
    Eric, I wonder if you'd think about covering one situation which Neal's post didn't tackle: the situation where there are no bugs in the existing compiler, but the specification changes so that the behaviour of compiling code in version N is different to the compiling code in version N+1. The only example I can immediately think of in C# is where delegate parameter contravariance allows for more methods to be included in a method group as valid conversions, leading to a breaking change in some very specific situations. The C# 2+ compiler warns of this change, thankfully - I wonder if there are any cases which don't have warnings? Anyway, I for one would find discussion of that topic fascinating :) Jon

  • Anonymous
    August 14, 2007
    Commander Riker, you have anticipated my denouement. Those are exactly the sorts of subtle breaking changes I had in mind.  The semantics of lambda conversion will likely introduce more in C# 4.0, particularly if we introduce other kinds of variance.

  • Anonymous
    August 14, 2007
    "1. the vast majority of the time it makes no difference" So people like to think :( I would argue that this really isn't the case. See below... "2. relying on a particular timing or ordering of finalization some small percentage of the time is probably a subtle bug waiting to happen" On the other hand, by not having deterministic finialization/destruction, it is an order of magnitude more difficult to write exception safe code without putting in additional scaffolding everywhere in your code. IMHO, using/IDisposable are staples of any robust C# program. Without them a program may have innumerable hidden bugs, just waiting to jump out of the woodwork when deployed on a customer's 'foreign' system. "4. specifying it ties our hands to make algorithm improvements in the future" Huh? Don't get me wrong, I'm not anti-GC/nondeterministic finalisation, but there's a lot of subtlety that exists as a side-effect of the feature that needs to be grasped in order to write robust code. Some might argue that it takes as much effort to educate C# users about writing exception-safe code as it does to educate C++ users how to use the standard library and modern techniques so as to completely avoid the C++ errors you mentioned (I can honestly say I haven't had a memory leak in any of the C++ code I've written in the last 3 years, just by making sure I write code in a modern way). I just thought the trade-off needed to be mentioned. In either language, I would say that education is the real issue.

  • Anonymous
    August 14, 2007
    The comment has been removed

  • Anonymous
    August 15, 2007
    > Don't all the minuses for the unspecified order of evaluation also apply to the unspecified order of finalization Yes. But the benefits of nondeterministic finalization outweigh the costs. The benefits of unspecified order of evaluation do not. > Embrace more modern C++ If I wrote only new C++ code, absolutely. I happen to work in code that is seven to fifteen years old and was written by large teams of people, many of whom had their own ideas of what "modern" code looks like. Some of the hardest bugs I've had to fix have been where people mixed "modern" idioms into old code and did not understand the subtle assumptions about memory model made by each. I hope you never have to experience the pain of having to clone a multithreaded COM object containing a vector of smart pointers to BSTRs! That is another one of the huge benefits of C# and the CLR -- one memory model, one finalization model, one string class, etc.

  • Anonymous
    August 15, 2007
    The comment has been removed

  • Anonymous
    August 15, 2007
    The comment has been removed

  • Anonymous
    August 15, 2007
    But this isn't about destruction, it is about creation.

  • Anonymous
    August 15, 2007
    Eric did you know your blog counts as a security related blog too? :) Today I received a news letter (Microsoft Security Newsletter - Volume 4, Issue 8) from Microsoft which lists your blog as one of 'Security Blogs'! You have about 29 entries tagged as 'Security' out of the hundreds.

  • Anonymous
    August 16, 2007
    Eric, I use Dispose and using statements everywhere; it is the best choice available, but... ...you made the statement that "But we do provide a mechanism (the "using" statement) whereby if you do need to ensure that a finalizer runs at a particular point, there is an easy syntax for it." I don't mean to sound pedantic but this statement is false. All that the using statement does is invoke a user-defined method "Dispose" that takes no arguments. It also requires that the object implement the IDisposable interface, and there are many objects, many of them sealed, that do not. Within the Dispose method it is entirely up to the user to ensure that proper cleanup occurs. It is also up to the user to invoke GC.SupressFinalize(this), which actually prevents the finalizer from being invoked.  In fact, I don't think you'd reach complete consensus on whether the call to GC.SupressFinalize(this) should occur before or after normal cleanup in Dispose has finished.   There is a great deal of scaffolding that has been built on top of this, with lots of best practices and assumptions baked into the conversations, and I've seen many examples which work...about 98% of the time and far less than that under abnormal shutdown. Depending on the circumstances, anything can fail, even a call to Trace().  I've examined a lot of code in Dispose methods that simply did not work correctly under all use-cases. It is incredibly difficult to write  a correct Dispose method that accounts for all the different environments and circumstances it can be invoked in, all the things that can go wrong, and which is robust and resilient in the face of unexpected failures. I think an interesting question is if a Dispose method can run simultaneously with a finalizer (I think it can). I think the situation is far better than in C++ and COM, and I'd rather write C# than anything else (right now) but it is a work in progress. I think that the current semantics of shutdown/cleanup/finalization works well for a certain type of application but is not well suited for others. About order of evaluation....I hate programs that rely on it. I would much prefer to look at code like this: x + (y * z) than x + y * z I don't want to have to spend time reasoning about order of evaluation -it's just one more thing to get wrong.  I'd even rather have the compiler issue a warning about code like that, because I'd bet that most of the time the developer did not even realize that it was a possible problem. My motto: "parentheses everywhere"

  • Anonymous
    August 16, 2007
    Tanveer:   > Eric did you know your blog counts as a security related blog too? Yes, I knew that. I put my blog on that list. > You have about 29 entries tagged as 'Security' out of the hundreds. That is correct.

  • Anonymous
    August 16, 2007
    The comment has been removed

  • Anonymous
    August 31, 2007
    I generally agree with the main points of the post but I found this to be a little bit surprising: «People entrust the operation of multi-million dollar businesses to their C# code; they're not going to do that if we can't even reliably tell them what "(++i) + i + (i++)" does» Well, C never told anyone what "(++i) + i + (i++)" does but plenty of people went off to build multi-million dollar businesses on top of it.  Microsoft included. I see lots of people confusing finalization with destruction.  There's an old chestnut.

  • Anonymous
    August 31, 2007
    The comment has been removed

  • Anonymous
    September 15, 2007
    Eric Lippert notes the perils of programming in C++ : I often think of C++ as my own personal Pit of