Postmortem Debugging - Better Late Than Never
If there is a consistent repro, I would definitely prefer Early Debugging. However in the real life postmortem debugging seems to be unavoidable.
There are three concepts I wish to clarify before digging into the details:
AeDebug is a set of registry keys which specify the behavior when unhandled exception happened in an user mode application.
- \\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug
- \\HKEY_LOCAL_MACHINE\Software\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug
By default AeDebug is configured to use drwtsn32.exe, which would capture a dump and terminate the application in problem.
Just-In-Time Debugging (a.k.a. JIT Debugging) is a feature provided by most debuggers (e.g. CDB, NTSD, WinDBG and Visual Studio Debugger), which allows the debugger to be launched and attached to the application in problem.
The JIT debugger shipped with Visual Studio is called vsjitdebugger.exe, which would pop up a window and let you decide the next step. Visual Studio stepped further by allowing JIT debugging for scripts.
Needless to mention, JIT Debugging is normally built on top of AeDebug.
Postmortem Debugging is an overloaded term which could mean debugging a dump, or JIT debugging.
Since I will cover JIT debugging in another article, I would prefer referring dump file debugging as Postmortem Debugging.
Okay, now let's go back to the topic, what would you do after receiving a dump file?
Understand the source of the dump file - under which condition was the dump file generated. Once you've confirmed the dump is coming from a trusted source, try to find out when and where the dump file was taken.
0:001> .time
Debug session time: Mon Dec 3 17:36:58.997 2012 (UTC - 8:00)
System Uptime: 2 days 23:31:41.638
Process Uptime: 0 days 0:00:14.156
Kernel time: 0 days 0:00:00.015
User time: 0 days 0:00:00.0000:001> vertarget
Windows 7 Version 7601 (Service Pack 1) MP (8 procs) Free x64 Product: LanManNt, suite: Enterprise TerminalServer SingleUserTS
kernel32.dll version: 6.1.7601.17514 (win7sp1_rtm.101119-1850)
Machine Name:
Debug session time: Mon Dec 3 18:37:21.103 2012 (UTC - 8:00)
System Uptime: 3 days 0:32:03.743
Process Uptime: 0 days 1:00:36.261
Kernel time: 0 days 0:00:00.015
User time: 0 days 0:00:00.0000:000> .lastevent
Last event: 14d0.1874: Break instruction exception - code 80000003 (first chance)Check the dump file type - mini dump or full dump, kernel dump or user mode dump, whether the dump contains an exception record. Normally WinDBG would display the dump type when you open a dump file, here we'll use the command learned in Undocumented WinDBG.
0:001> .dumpdebug
----- User Mini Dump Analysis
MINIDUMP_HEADER:
Version A793 (6804)
NumberOfStreams 14
Flags 9164
0004 MiniDumpWithHandleData
0020 MiniDumpWithUnloadedModules
0040 MiniDumpWithIndirectlyReferencedMemory
0100 MiniDumpWithProcessThreadData
1000 MiniDumpWithThreadInfo
8000 MiniDumpWithFullAuxiliaryState
If it's a user mode dump, additional information needs to be retrieved from the dump.
- What is the command line, and whether the process is a generic host such like dllhost.exe, svchost.exe taskhost.exe and w3wp.exe.
- Understand the bitness - whether it is a 64bit process or 32bit process. It would be tricky while debugging a 64bit dump of WOW32 process.
- Whether CLR is involved, and what is the CLR version (note there could be more than one CLR hosted).
(to be continued...)