Structured Exception Handling Considered Harmful

I could have sworn that I wrote this up before, but apparently I’ve never posted it, even though it’s been one of my favorite rants for years.

In my “What’s wrong with this code, Part 6” post, several of the commenters indicated that I should be using structured exception handling to prevent the function from crashing.  I couldn’t disagree more.  In my opinion, SEH, if used for this purpose takes simple, reproducible and easy to diagnose failures and turns them into hard-to-debug subtle corruptions.

By the way, I’m far from being alone on this.  Joel Spolsky has a rather famous piece “Joel on Exceptions” where he describes his take on exception (C++ exceptions).  Raymond has also written about exception handling (on CLR exceptions).

Structured exception handling is in many ways far worse than C++ exceptions.  There are multiple ways that structured exception handling can truly mess up an application.  I’ve already mentioned the guard page exception issue.  But the problem goes further than that.  Consider what happens if you’re using SEH to ensure that your application doesn’t crash.  What happens when you have a double free?  If you don’t wrap the function in SEH, then it’s highly likely that your application will crash in the heap manager.  If, on the other hand, you’ve wrapped your functions with try/except, then the crash will be handled.  But the problem is that the exception caused the heap code to blow past the release of the heap critical section – the thread that raised the exception still holds the heap critical section. The next attempt to allocate memory on another thread will deadlock your application, and you have no way of knowing what caused it.

The example above is NOT hypothetical.  I once spent several days trying to track down a hang in Exchange that was caused by exactly this problem – Because a component in the store didn’t want to crash the store, they installed a high level exception handler.  That handler caught the exception in the heap code, and swallowed it.  And the next time we came in to do an allocation, we hung.  In this case, the offending thread had exited, so the heap critical section was marked as being owned by a thread that no longer existed.

Structured exception handling also has performance implications.  Structured exceptions are considered “asynchronous” by the compiler – any instruction might cause an exception.  As a result of this, the compiler can’t perform flow analysis in code protected by SEH.  So the compiler disables many of its optimizations in routines protected by try/catch (or try/finally).  This does not happen with C++ exceptions, by the way, since C++ exceptions are “synchronous” – the compiler knows if a method can throw (or rather, the compiler can know if an exception will not throw).

One other issue with SEH was discussed by Dave LeBlanc in Writing Secure Code, and reposted in this article on the web.  SEH can be used as a vector for security bugs – don’t assume that because you wrapped your function in SEH that your code will not suffer from security holes.  Googling for “structured exception handling security hole” leads to some interesting hits.

The bottom line is that once you’ve caught an exception, you can make NO assumptions about the state of your process.  Your exception handler really should just pop up a fatal error and terminate the process, because you have no idea what’s been corrupted during the execution of the code.

At this point, people start screaming: “But wait!  My application runs 3rd party code whose quality I don’t control.  How can I ensure 5 9’s reliability if the 3rd party code can crash?”  Well, the simple answer is to run that untrusted code out-of-proc.  That way, if the 3rd party code does crash, it doesn’t kill YOUR process.  If the 3rd party code is processing a request crashes, then the individual request fails, but at least your service didn’t go down in the process.  Remember – if you catch the exception, you can’t guarantee ANYTHING about the state of your application – it might take days for your application to crash, thus giving you a false sense of robustness, but…

 

PS: To make things clear: I’m not completely opposed to structured exception handling.  Structured exception handling has its uses, and it CAN be used effectively.  For example, all NT system calls (as opposed to Win32 APIs) capture their arguments in a try/except handler.  This is to guarantee that the version of the arguments to the system call that is referenced in the kernel is always valid – there’s no way for an application to free the memory on another thread, for example.

RPC also uses exceptions to differentiate between RPC initiated errors and function return calls – the exception is essentially used as a back-channel to provide additional error information that could not be provided by the remoted function.

Historically (I don’t know if they do this currently) the NT file-systems have also used structured exception handling extensively.  Every function in the file-systems is protected by a try/finally wrapper, and errors are propagated by throwing exception this way if any code DOES throw an exception, every routine in the call stack has an opportunity to clean up its critical sections and release allocated resources.  And IMHO, this is the ONLY way to use SEH effectively – if you want to catch exceptions, you need to ensure that every function in your call stack also uses try/finally to guarantee that cleanup occurs.

Also, to make it COMPLETELY clear.  This post is a criticism of using C/C++ structured exception handling as a way of adding robustness to applications.  It is NOT intended as a criticism of exception handling in general.  In particular, the exception handling primitives in the CLR are quite nice, and mitigate most (if not all) of the architectural criticisms that I’ve mentioned above – exceptions in the CLR are synchronous (so code wrapped in try/catch/finally can be optimized), the CLR synchronization primitives build exception unwinding into the semantics of the exception handler (so critical sections can’t dangle, and memory can’t be leaked), etc.  I do have the same issues with using exceptions as a mechanism for error propagation as Raymond and Joel do, but that’s unrelated to the affirmative harm that SEH can cause if misused.

Comments

  • Anonymous
    September 10, 2004
    I'm pretty much in the same camp as Joel wrt exceptions. I just wish that __try/__finally were actually usable in C++. Not being able to use it in a scope that contains C++ objects makes it pretty much useless.

  • Anonymous
    September 10, 2004
    Use _set_se_translator if you have to use try/catch -- it lets you take SEH exceptions and throw then as C++ exceptions.

  • Anonymous
    September 10, 2004
    The only time I ever had to use SEH was to recover from predictable crashes in IE when I was hosting the web browser control from shdocvw.

  • Anonymous
    September 10, 2004
    SEH is not evil; __try/__except(1) and catch(...) are.

  • Anonymous
    September 10, 2004
    > Use _set_se_translator if you have to use
    > try/catch -- it lets you take SEH exceptions
    > and throw then as C++ exceptions.

    Don't forget to enable /EHa (and as a result, say good-bye to a lot of compiler optimizations) if you do this.

  • Anonymous
    September 10, 2004
    What optimizations are affected by /EHa?

  • Anonymous
    September 10, 2004
    Nevermind; uncovered it in the VC documentation eventually.

    For the curious:

    "In previous versions of Visual C++, the C++ exception handling mechanism supported asynchronous (hardware) exceptions by default. Under the asynchronous model, the compiler assumes any instruction may generate an exception.

    With the new synchronous exception model, now the default, exceptions can be thrown only with a throw statement. Therefore, the compiler can assume that exceptions happen only at a throw statement or at a function call. This model allows the compiler to eliminate the mechanics of tracking the lifetime of certain unwindable objects, and to significantly reduce the code size, if the objects' lifetimes do not overlap a function call or a throw statement. The two exception handling models, synchronous and asynchronous, are fully compatible and can be mixed in the same application.

    Catching hardware exceptions is still possible with the synchronous model. However, some of the unwindable objects in the function where the exception occurs may not get unwound, if the compiler judges their lifetime tracking mechanics to be unnecessary for the synchronous model."

  • Anonymous
    September 10, 2004
    Pavel,
    The problem is that if you're not going to do __try/__except(1) or (catch(...)), then what do you do?

    The hard part of getting SEH correct is that people don't know what to do for the __except(1) part - that is very, very hard to get right, and is app specific (so Microsoft can't provide a "right" answer).

    People look at SEH and their first assumption is that they can use it to add robustness to their applications. All I'm trying to say is that SEH cannot be used as a robustifier, it usually has the exact opposite effect.

  • Anonymous
    September 10, 2004
    If /EHa is used, compiler assumes that every instruction could raise a C++ exception so it needs to do a lot of bookkeeping to ensure for example that local variables with destructors are cleaned up properly.

    I don't know how much of an impact this has on performance but presumably it was important enough to switch to /EHs by default in VC6 (or was it VC5?) and even add things like __declspec(nothrow).

  • Anonymous
    September 10, 2004
    > The problem is that if you're not going to
    > do __try/__except(1) or (catch(...)), then
    > what do you do?

    In theory, you could use SEH to do relatively safe things like lazily committing memory by catching access violations when the buffer grows beyond its initial size (I think FormatMessage does this when you tell it to allocate the buffer for you). You just need to be careful to not catch more than you need - instead of using __except(1), write a filter that makes sure the exception code is right, the referenced address is where you expect it to be, etc. Return EXCEPTION_CONTINUE_SEARCH for everything that you don't recognize.

    In practice however I think that you're right - most apps should probably stay away from SEH. It doesn't play well with C++ exception handling, and complicates debugging, especially if you use it to handle critical exceptions like AVs (windbg stops on 1st chance AVs).

  • Anonymous
    September 10, 2004
    I had to chuckle a little when I saw this posted right after I removed a __try/__except(1) that was wrapping an entire program and was met with the a chorus of other developers objecting with "but that stops it from crashing!". Now I can just send a link instead of a long winded explanation whenever I hear that! Thanks Larry!

  • Anonymous
    September 10, 2004
    The comment has been removed

  • Anonymous
    September 10, 2004
    As it took my forever to compile that post in this small window, I now notice that plenty of posts has come in since I started.

    Ian:
    What did you gain by removing the catch(1)?

    Pavel:
    In what way does exception handling complicate debugging?


  • Anonymous
    September 10, 2004
    Great post Larry.

    I'd also like to add one further argument against SEH:
    I believe it is patented and so is not supported on other compilers and platforms.

  • Anonymous
    September 10, 2004
    > In what way does exception handling complicate debugging?

    Extensive use of exceptions, especially low-level ones like access violations, complicates debugging because you can no longer tell truly exceptional cases from normal program operation.

    Here's a scenario that I've seen many times. You suspect that you have a crash somewhere in your program. You don't know for sure because somebody is catching it with __except(1) or catch(...), so instead of a nice memory dump with a callstack that tells you exactly where the problem happened, you get a deadlock with some orphaned locks, or your process simply disappears, or dies with some undebuggable error.

    You try running the program under debugger so that you can catch 1st chance access violations and other "bad" exceptions, only to find that it actually raises dozens of such exceptions during its normal operation. You waste even more time trying to filter out the noise and locate the real problems.

    All this because two fundamental rules of exception handling have been violated:

    1. Don't catch exceptions that you don't know how to recover from.

    2. Only use exceptions for exceptional cases.

  • Anonymous
    September 10, 2004
    Niclas,

    Remember that we're talking about SEH here, not exceptions in general, so we're looking at pretty bad events like access violations and guard page exceptions. I can probably count on one hand the number of cases where a problem could properly recover from these events. Usualy they're indicative of a bug in your code.

    For example, as I mentioned in my last post, I removed a a __try/__except(1) block that was wrapping an entire program. The program in question was a server, and if it caught an exception, it would log it and then happily go on serving clients. But the program couldn't tell what had caused that exception to be thrown, and something like an access violation often points to very bad things like memory corruption. So, rather then trying to keep going, it was better to crash and let the service control manager restart the server in a clean state.

  • Anonymous
    September 10, 2004
    At this point I'm pretty well convinced that it's nigh unto impossible to write reliable software that uses exceptions for error propagation. To get a flavor, see http://blogs.msdn.com/mgrier/archive/2004/02/18/75324.aspx.

    Exceptions only really work reliably when nobody catches them.

    And in that case, I don't understand why we don't just call something like BugcheckApplication() instead of throwing an actual catchable exception.

    It was clever on VMS to have continuable exceptions which led to the SEH design on NT. I'm not sure that giving code the ability to do fun things like user-mode fixups of things like uncommitted virtual address space or adjust FP results etc. is worth the complexity that this design entails.

    Catching exceptions in an exception rich environment (like the CLR or Java for example) is nearly impossible to do correctly. If we were to start writing in C again, it's do-able but the fact that all the new languages include capabilities like implicit conversions and operator overloading means that it's impossible to understand whether the scope of the try/catch is correct. (And even if it was correct, changes to other parts of the code can invalidate your careful analysis and coding.)

    So, Larry's point is entirely valid but once you accept it, it's not hard to see that the use of exceptions in modern languages fundamentally makes it impossible to write reliable software.

    Which is, of course, funny since most people think that exceptions are about writing reliable software finally. Well, I guess throwing the exceptions is OK. It's just those super geniuses who think that they can catch them that mess it all up. :-)

  • Anonymous
    September 10, 2004
    Anything is better than crash simply because user still has a chance to save his work. Yeah, corruption may happen and app should warn user that exception has been caught and it might be a good idea to restart the application. But it is BETTER than crash. In debug build crash is better.

  • Anonymous
    September 10, 2004
    > Anything is better than crash simply because
    > user still has a chance to save his work.

    For a text editor, maybe. For a non-interactive service that processes financial transactions, definitely no.

    And even in a text editor a crash is better than an undebuggable deadlock after some COM object corrupts the heap then swallows the resulting AV leaving the default process heap critical section orphaned.

    If you want to allow user to save his work in case of an unhandled exception, that's fine. Nobody is saying you shouldn't do that. But catching unknown exceptions and not reporting them properly (using ReportFault() or something similar to that) is often worse than no exception handling at all.

  • Anonymous
    September 10, 2004
    Pavel:

    Well then I would agree with you if the design uses throw extensivly. I do not like a design that throws extensivly as it complicates debugging =), that is why I tried to explain that exceptions should only happen in rare conditions. So it is not the exception theory itself that complicates it for you, it is the implementation of it.

    If the caught exception leaves the application with unreleased resources, then it was not caught in all levels it needed to be caught to clean up properly. It is not the exceptions fault.

    You want a nice memory dump, even if a nice memory dump is heaven, a logged stack trace and full detail of the exception(and maybe even a hexdump of the surrounding memory) will provide you with almost equally interesting information, and you can let the memory dumps stay in house as much as possible.

    But I do agree with you, if you have a service that is not allowed to glitch, then don't start guessing on the state of your application, you don't want a $10 transaction turn into a $100000 one, unless of course it is your paycheck =)


    -------------

    Extensive use of exceptions is a bad thing according to me, I do not like the philosphy of Java/C#. To me it is a just a lazy way of getting out of trouble, get less nested if statements. But your code path has more than one exit point, I don't like that because it tends to trick the developer into resource leaks. I have seen so many cases where a developer grabs some resource when the function begins, and then added some check afterwards in the code that merely did a return in the middle of it, but forgot to return the resources. I believe in simple design.

    grab resource
    work with resource, record error/succes
    release resource
    return error/success

    And in an exntensive exception philosphy this would be

    grab resource
    try
    work with resource throw on error
    catch or finally(cleanup, which is so much better)
    cleanup

    throw again if the work part throwed else return sucess.

    Which is a design that I do not like.

    I have had many of these discussions before, and the only way to convince anyone of the opposite is to show it in practice, implemented in a way which I think is proper and safe. I have never had complicate odd crashes due to it, but instead I have slept better knowing that even if we have a bug we might survive, if we didn't then too bad. Are starting state is already a crash so it can't get worse.

    One of the worst appliances of exceptions I can think of is COM objects used with the non raw interface wrappers, any E code is merely turned into a throw...


    Not far from this discussion is the question if you should leave asserts active in release build. I surely don't think so, and assert in general is a lazy way to do things, it tends to lead the developer to not care of the failure scenario, and thus not to the proper clean up. Why should he/she? It will crash on the assert anyway.

    Most of the examples brought up are very rare conditions of asynchronous exceptions, most always they are much less harmful, and even more often they are merely a NULL pointer exception, which of course is a bug, but usually not fatal in any way to the program state. The non NULL pointer exceptions however are more scary.

    But it is just a question of determining which part of your program state that is most likely corrupt, rinse that and go again.

    If the same exception keeps thrashing(because your estimate of which parts of your program state that must be corrupt was wrong) then it is about time to abort. But then again that logic applies to any kind of unconditional loops, if you don't supervise then in some way it can be a possible hang.

    catching exceptions are not a way to make your application more robust, but it is a tool among many to make it more robust

  • Anonymous
    September 11, 2004
    The comment has been removed

  • Anonymous
    September 11, 2004
    The comment has been removed

  • Anonymous
    September 11, 2004
    > If an application has an "unexpected
    > exception", all bets are off as far as any
    > level of functionality.

    That's taking it to the other extreme.

    Certainly there are applications where saving user's work in case of a crash makes sense. Like email processors for example.

    This is a separate issue from where to handle unknown exceptions and how to report them.

  • Anonymous
    September 12, 2004
    Trying to run code in an address space which is likely to be corrupt is just plain bad for the user. If you really want to preserve the value of the keystrokes/operations that occurred before the crash, then journal them!

    All the editors on VAX/VMS journalled; I was shocked to come to the PC world and that we never do such things.

    This is a much smarter approach than to try to continue to run code in the corrupt address space.

  • Anonymous
    September 12, 2004
    The comment has been removed

  • Anonymous
    September 12, 2004
    9/12/2004 2:39 PM Niclas Lindgren

    > journalling will recreate what you just did,
    > which means if you hit the bug doing it,
    > journalling to back track to it will most
    > likely hit it again and voila

    Bingo. I once edited a journal, deleting the mention of the keystroke that was mishandled by the editor, so that replay of the modified journal would not hit the bug. Then I saved my work at that point, i.e. saving with a loss of one known keystroke instead of saving with a loss of a forgotten number of minutes and changes. Then I found some other way to proceed with the next necessary change.

    Theoretically the journal would also be of immense value to any coder who wanted to fix the editor.

    > we had to bring in the right people and go
    > through en emergency build procedure

    Well, I'll repeat here an idea which got me a black mark on my record at one previous employer, and which could have got me fired if my boss had been present. Maybe someone can say what was wrong with it. When the current build wasn't working and you had an emergency situation where you needed a working build, boot the previous working build. While the production system is operating on its previous build, configure one test system to match, boot the failing build there, and debug it on the test system. Orange flag emergency instead of red flag. During the time that the previous build is running, you don't get to charge customers for features that aren't being provided, but you still get to charge for basic telephone service and customers still have it.

    (Well actually yes I do know what was wrong with my suggestion. I'm an engineer and corporate politicians are corporate politicians, and there's no room for engineers in companies that are run by politicians.)

  • Anonymous
    September 13, 2004
    For user apps, auto-saving work, automatically restarting the app, and giving the user the option to recover their old documents is usually fine.

    For online services doing financial transactions, you better not be messing with my money in corrupt address space.

  • Anonymous
    September 14, 2004
    >As a result of this, the compiler can’t >perform flow analysis in code protected by >SEH. So the compiler disables many of its >optimizations in routines protected by >try/catch (or try/finally). This does not >happen with C++ exceptions

    shouldnt that read "try/except (or try/finally)" ?

  • Anonymous
    September 15, 2004
    probably try/except/finally, you're right rsd.

  • Anonymous
    December 29, 2007
    PingBack from http://cars.oneadayvitamin.info/?p=650

  • Anonymous
    May 01, 2008
    I just ran into this post by Eric Brechner who is the director of Microsoft's Engineering Excellence

  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=larry-osterman-s-weblog-structured-exception-handling-considered-harmful

  • Anonymous
    June 15, 2009
    PingBack from http://mydebtconsolidator.info/story.php?id=8227