Careful with that axe, part two: What about exceptions?

(This is part two of a two-part series on  the dangers of aborting a thread. Part one is here.)

Suppose you’re shutting down the worker thread we were talking about last time, and it throws an exception? What happens?

Badness, that’s what. What to do about it?

As in our previous discussion, it is better to not be in this situation in the first place: write the worker code so that it does not throw. If you cannot do that, then you have two choices: handle the exception, or don't handle the exception.

Suppose you don't handle the exception. As of I think CLR v2, an unhandled exception in a worker thread shuts down the whole application. The reason being, in the past what would happen is you'd start up a bunch of worker threads, they'd all throw exceptions, and you'd end up with a running application with no worker threads left, doing no work, and not telling the user about it. It is better to force the author of the code to handle the situation where a worker thread goes down due to an exception; doing it the old way effectively hides bugs and makes it easy to write fragile applications.

Suppose you do handle the exception. Now what? Something on another thread threw an exception, which is by definition an unexpected, exceptionally bad error condition. You now have no clue whatsoever that any of your data is consistent or any of your program invariants are maintained in any of your subsystems. So what are you going to do? There's hardly anything safe you can do at this point.

The question is "what is best for the user in this unfortunate situation?" It depends on what the application is doing. It is entirely possible that the best thing to do at this point is to simply aggressively shut down and tell the user that something unexpected failed. That might be better than trying to muddle on and possibly making the situation worse, by, say, accidentally destroying user data while trying to clean up.

Or, it is entirely possible that the best thing to do is to make a good faith effort to preserve the user's data, tidy up as much state as possible, and terminate as normally as possible.

Both today’s question and the one from last time are specific versions of the more general question "what do I do when my subsystems running on worker threads do not behave themselves?" If your subsystems are unreliable, either make them reliable, or have a policy for how you deal with an unreliable subsystem, and implement that policy. That's a vague answer I know, but that's because dealing with an unreliable subsystem is an inherently awful situation to be in. How you deal with it depends on the nature of its unreliability, and the consequences of that unreliability to the user's valuable data. There are no easy one-size-fits-all answers here, unfortunately.

(This is part two of a two-part series on  the dangers of aborting a thread. Part one is here.)

Comments

  • Anonymous
    February 25, 2010
    Hey Eric, I've discovered your blog recently and I love it.  I do have a random question for you: How do you pronounce your last name? "LIP-pert", "LYE-pert", "li-PAIR" (like the chef Eric Rippert)? I ask because I'd like to be able to tell my friends and coworkers "You really should be reading Eric Lippert's blog" without sounding like an idiot. Glad you like the blog, and thanks for asking. The first on your list is correct. The name is German in origin, which is not surprising considering that the part of Ontario I grew up in was largely settled by German immigrants. In my mother's childhood it was still common to hear German spoken as a first language in people's homes, though not so much these days. The other Eric Lipperts I've run into online over the years tend to be of German, Dutch or Scandinavian origin. -- Eric

  • Anonymous
    February 25, 2010
    Eric Ripert apparently spells his name with just one p, so I already sound a little like an idiot.

  • Anonymous
    February 25, 2010
    Unfortunately? Hey, if they were a pre-made answer for everything, coding wouldn't be as fun.

  • Anonymous
    February 25, 2010
    @Tim, you are inherently unreliable and must be shut down ;-) @Eric You didn't mention a third case, which is where a thread might be doing something "off to the side" and it's ok if it fails, or needs to be restarted - it doesn't interfere with the rest of the system. So an unexpected exception can be handled without terminating. I admit that this situation occurs less frequently than many developers seem to think, but it remains a legitimate case, nonetheless.

  • Anonymous
    February 25, 2010
    The comment has been removed

  • Anonymous
    February 25, 2010
    @ danielearwicker I thought finally blocks ran regardless?  The only way I know that a finally block won't run is if you don't step into it's corresponding try to begin with. Of course, I stand to be corrected.

  • Anonymous
    February 25, 2010
    The comment has been removed

  • Anonymous
    February 26, 2010
    @Adam - Environment.FailFast makes sure that any active finalizers aren't run, see the MSDN docs for more info http://msdn.microsoft.com/en-us/library/ms131100.aspx.

  • Anonymous
    February 28, 2010
    The comment has been removed

  • Anonymous
    March 01, 2010
    As a side note, I've found it much easier to deal with background thread abort when writing in F# - simply because the thread is doing a completely encapsulated computation that does not affect the state of the system in any way until fully computed (and then the result is published via a single atomic reference field assignment). Of course, the same thing is perfectly doable in C#, it's just that in F# you're always in that "no side effects" mental mode by default, the language making it easiest. That it also makes it easy to discard a thread without caring about invariants and such was an unintended side-effect that was discovered long after the code was written.