Compartir a través de


Whole program vs. local analysis

A quick note for today. Maybe more tonight.

Internally at Microsoft, as you can imagine, we have a number of tools to assist building our software.  Many are internal-only; contrary to the black-helicopter gang out there, there is a lot of cost to shipping software and it can divert your attention and resources from your primary goal of making your teams' code faster/better/whatever.

I won't divulge the details or identity of any of these tools; I'm sure you can references to all of them if you cover the MSFT blogosphere.  Let's call one of them "mega-lint".

Some people have never seen the need for a tool like lint; that's what the compiler is for, right?  The problem is that as I have alluded to several times now, compilers are all about syntax and translation semantics; their goal up to now has been to let you express your ideas more fluidly, not necessarily with better intent.

In previous example where the void reverse_array() function was calling member functions/operators on a type, on one hand it was "obvious" that they would succeed but on the other hand if you want to look at the contract long and hard, it's actually not obvious that they might succeed.  Consider, for example, a language/runtime environment where the use of operator[] was remoted (auto-remoted interfaces are problematic and will be a topic of several entries later on).  The "obviousness" that the only failure modes were covered due to our working within the documented limits of the array are now questionable.  Sorry to lapse into C++ for a bit but consider if the signature of reverse_array() had been:

template <typename T> void reverse_array(T &array) { ... }

Sure std::vector may give you some sort of guarantee, but given the requirements (existence of a size() member function and implementation of operator []), you had better code your template to deal with exceptions being thrown (a/k/a errors being returned) by all of those cases.  I had to invoke C++ here because the C equivalent of a macro would be somewhat uncompelling and I'm not familliar with Java and CLR generics enough to say whether the error contract for the members can be explicitly called out.

Back to the point.  There's no way to declare that we don't expect operator[] to fail as long as we pass in legal bounds.  (consider a sparse array implementation which may have to dynamically allocate the element on first access...)

How do we even figure out that there is a problem there?  Should all code just always write the try/catch blocks?  Probably not!  The main claim to fame for exception handling is that you should write less code, not more.  But then someone with a try/catch up higher on the stack is probably not expecting to catch the silly operator [] failure.

This kind of problem is impossible to diagnose without either massive simulation tools which can do whole program analysis and simulation or, and this is my preference, reducing the problem to be a more local one about contract description and local analysis.

Local analysis is much easier since it's what every programmer does when reading code.  Now, we've been trained badly; either there are APIs which, by convention, you ignore their failure status (fclose, fprintf, etc.) or there are new patterns building up where you have a try/catch block around a function where you catch all errors and rethrow a different exception.  Both of these inhibit easy local analysis and both patterns must be stopped if we're going to make progress in making software more reliable.

Usual caveat applies: People writing actual applications instead of reusable libraries can dial this setting pretty much wherever they want to.  Shared code authors should be very aware of this problem and the "v1" people out there (you know them, the ones who have a cool idea, get the awards and good bonuses and then move on to the next v1 project leaving their path of destruction a mile wide behind them...) need to get some discipline.

Comments

  • Anonymous
    May 09, 2005
    The comment has been removed
  • Anonymous
    May 09, 2005
    The comment has been removed
  • Anonymous
    May 09, 2005
    The comment has been removed
  • Anonymous
    May 10, 2005
    Well you mentioned catch(...) twice so I had to take the bait.

    Thing is, lots of people do write class templates. The STL containers have very well defined exception specifications.

    Here is a great paper on the subject http://www.boost.org/more/generic_exception_safety.html

    It is neccessary for every function in the to be fully transactional. They should give the best level of exception safety that they can, and if the caller needs more, it is the caller's responsibility.

    If I wanted complete transaction safety when calling reverse_array() with my arrayType. I would make a copy, pass that to the function and if the function succeeded I would replace my original object.

    But I wouldnt want reverse_array to do that for me. It simply may not be appropriate a lot of the time.
  • Anonymous
    May 10, 2005
    A claim to be transactional but not exception safe is void. The whole point of the transactional contract is to be able to predict behavior of the system in failure modes.
  • Anonymous
    May 11, 2005
    The comment has been removed
  • Anonymous
    May 11, 2005
    I'm sorry I don't buy the discussion of the weak guarantee. Basically the weak guarantee means that you might be able to get your work done (as a client of the weak guarantee code) but your guarantee to the next level up can't be more than weak. I'll reread the description to make sure I'm on solid ground here but as I read the paper, even the "strong" guarantee is not strong enough for you to necessarily provide a robust interface for people above you to give strong guarantees. (Which is basically the point of this series.)