Anatomy of a Heisenbug

I just spent a half an hour debugging a heisenbug and thought I’d pass on what was happening.

 

I was running my unit tests for one of my features and they were reliably failing.  Unfortunately the instant I ran the test case under the debugger, the problem went away.  Problems that disappear under the debugger are a classic symptom of a heisenbug, this was no exception.

If I attached the debugger AFTER the test started but before the failure hit, I was able to see the failure occur.  The problem only occurred when launching the app under the debugger.  At that point I realized what was happening.

As MSDN says

Processes that the debugger creates (also known as spawned processes) behave slightly differently than processes that the debugger does not create.

Instead of using the standard heap API, processes that the debugger creates use a special debug heap. On Microsoft Windows XP and later versions of Windows, you can force a spawned process to use the standard heap instead of the debug heap by using the _NO_DEBUG_HEAP environment variable or the -hd command-line option.

It turns out that I had added a member variable to a class and failed to initialize it in the constructor of the class.  When launched under the debugger, the debug heap initializes all allocated memory to a known value.  That means that when launched under the debugger the member variable had that known value, when launched without the debugger it was uninitialized and happened to have the value of 0.  For a number of reasons, this value caused the test to fail. 

 

I hate heisenbugs, but I like it when they’re really easy to find like this one.

Comments

  • Anonymous
    September 03, 2008
    PingBack from http://hoursfunnywallpaper.cn/?p=4407

  • Anonymous
    September 03, 2008
    Many years ago, I took over maintenance of an ancient vertical market accounting package written in C.   I think it used MSC 4.0.   It had a few dozen users, not huge, but they were all using it to run their business. In the process of moving it to a more modern compiler (MSC 5.1 I think) I noticed that one of the main menu items called some code that used some uninitialized variables.  I said "this can't possibly work", built it and tested, and sure enough, if I ran the menu item it crashed on me.   This particular menu item ran a report that the users would have used once a month or so, and the binaries that they'd been using were over a year old. within 24 hours of spotting the bug and verifying it, we got calls in, where customers were reporting the same exact problem, which luckily I'd already fixed.   I'm absolutely convinced that this was a different form of a heisenbug - once I spotted the problem in the sourcecode the waveform collapsed, and customers everywhere started having problems.

  • Anonymous
    September 03, 2008
    Is that specific to WinDbg or would it work with Visual Studio too? I guess –hd is for WinDbg only, but what about _NO_DEBUG_HEAP? Also, does this affect the debug CRT heap in any way? I’ve had my share of such bugs and I had to run the process without a debugger and then attach to make it use the default heap. Also the debug heap is slower, much more so in Vista. It would be great if I can disable it completely for Visual Studio. Ivo

  • Anonymous
    September 03, 2008
    Perhaps you would use this opportunity to ask nice folks working on VC to make it issue a warning when a member variable of a built-in type is left uninitialized in a constructor. There are extremely few scenarios where not initializing such members in constructor makes any sense and these could be dealt with by usual warning disabling methods. On the other hand this issue routinely bites every C++ programmer out there. It is really astonishing that in 2008 many compilers still don't warn about it.

  • Anonymous
    September 03, 2008
    Some documentation strongly implies that Windows itself activates a debug heap if CreateProcess determines that the process being created has a debugger of any kind attached. (DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS flags to CreateProcess I guess drive this) _NO_DEBUG_HEAP seems to be processed (by CreateProcess) at that level. Release or debug build processes launched by devstudio then are going to be using the win32 debug heap, and the release or debug CRT heap depending on the build configuration. msvcrt.dll and msvcrtd.dll seem to defer most of their allocations down to the win32 heap API, so you get a wierd matrix of memory fill possibilities: If there is an attached debugger when the heap is created, the debug heap will for example fill allocated memory with 0xbaadf00d. If the debug CRT is being used, it will then overwrite that fill with its debugging fill: 0xcd. When freeing memory with free() or delete, the reverse holds. debug versions of these functions fill deleted memory with 0xdd, however, they then immediately call HeapFree(), which - again if the win32 debug heap is in use - overwrites those values with 0xfeeefeee - so you typically never see 0xdd, unless you use _CrtDebug* functions to make the debug CRT delay its call to HeapFree(). Sadly, MSDN seems to have no documentation - outside of the debugging tools for windows - that in any way explains the behaviour of processes created with DEBUG(_ONLY_THIS)_PROCESS flags.

  • Anonymous
    September 03, 2008
    @Skip: that would be a Schroedinbug. @Ivo: I would expect that the OS processes _NO_DEBUG_HEAP, not the debugger itself. I can't see how else the debugger would be able to enable the feature, unless it's one of the undocumented features of (e.g.) NtSetInformationProcess.

  • Anonymous
    September 04, 2008
    That's why debug malloc libraries ought to fill memory with a known pattern. I like 0xDEADBEEF myself. That way, anything that depends on the contents of uninitialized memory will fail immediately and spectacularly. See valgrind's --malloc-fill option.

  • Anonymous
    September 04, 2008
    quotemstr: But that's exactly the cause of the heisenbug - because the memory was filled with a known pattern, the uninitialized variable wasn't found.  

  • Anonymous
    September 04, 2008
    Sounds to me like you could have put Application Verifier to good use here.

  • Anonymous
    September 05, 2008
    Soren: Actually appverifier would have masked this bug for the same reason that windbg masked the bug.  The bug only showed up when the uninitialized member variable was set to 0, but all the analysis tools set uninitialized memory to non zero values.

  • Anonymous
    September 07, 2008
    Larry, Appverifier does have a setting for catching uninitialized variables though (Miscellaneous->DirtyStacks), which causes the following warning to be displayed when running under a debugger: Run-Time Check Failure #3 - The variable 'b' is being used without being initialized.

  • Anonymous
    September 08, 2008
    Mike Dimmick -> Its actually a flag looked for by the debug version of the CRT.

  • Anonymous
    September 09, 2008
    H:  Is the bug fixed? S:  We don't know yet.  Let's look at it. S:  Hey, where's the bug? H:  We don't know, but I can tell you how fast it was going.

  • Anonymous
    September 10, 2008
    I ran into a similar heisenbug this last summer - I wasn't initializing the ref count of an object.  This caused crashes that went away under a debugger. http://blogs.msdn.com/matthew_van_eerde/archive/2008/05/27/spot-the-bug-imfoutputschema.aspx I was able to track that one down by adding logging at selected points in the code path.

  • Anonymous
    September 10, 2008
    The comment has been removed