Finding the Failed Process
Before finding the failed process, make sure that you are in the context of the accepting processor. To determine the accepting processor, use the !pcr extension on each processor and looking for the processor for which an exception handler has been loaded. The exception handler of the accepting processor has an address other than 0xFFFFFFFF.
For example, because the address of NtTib.ExceptionList on this processor, is 0xFFFFFFFF, this is not the processor with the failed process:
0: kd> !pcr
PCR Processor 0 @ffdff000
NtTib.ExceptionList: ffffffff
NtTib.StackBase: 80470650
NtTib.StackLimit: 8046d860
NtTib.SubSystemTib: 00000000
NtTib.Version: 00000000
NtTib.UserPointer: 00000000
NtTib.SelfTib: 00000000
SelfPcr: ffdff000
Prcb: ffdff120
Irql: 00000000
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: 80036400
GDT: 80036000
TSS: 80257000
CurrentThread: 8046c610
NextThread: 00000000
IdleThread: 8046c610
DpcQueue:
However, the results for Processor 1 are quite different. In this case, the value of NtTib.ExceptionList is f0823cc0, not 0xFFFFFFFF, indicating that this is the processor on which the exception occurred.
0: kd> ~1
1: kd> !pcr
PCR Processor 1 @81497000
NtTib.ExceptionList: f0823cc0
NtTib.StackBase: f0823df0
NtTib.StackLimit: f0821000
NtTib.SubSystemTib: 00000000
NtTib.Version: 00000000
NtTib.UserPointer: 00000000
NtTib.SelfTib: 00000000
SelfPcr: 81497000
Prcb: 81497120
Irql: 00000000
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: 8149b0e8
GDT: 8149b908
TSS: 81498000
CurrentThread: 81496d28
NextThread: 00000000
IdleThread: 81496d28
DpcQueue:
When you are in the correct processor context, the !process extension displays the currently running process.
The most interesting parts of the process dump are:
The times (a high value indicates that process might be the culprit).
The handle count (this is the number in parentheses after ObjectTable in the first entry).
The thread status (many processes have multiple threads). If the current process is Idle, it is likely that either the machine is truly idle or it hung due to some unusual problem.
Although using the !process 0 7 extension is the best way to find the problem on a hung system, it is sometimes too much information to filter. Instead, use a !process 0 0 and then a !process on the process handle for CSRSS and any other suspicious processes.
When using a !process 0 7, many of the threads might be marked "kernel stack not resident" because those stacks are paged out. If those pages are still in the cache that is in transition, you can get more information by using a .cache decodeptes before !process 0 7:
kd> .cache decodeptes
kd> !process 0 7
If you can identify the failing process, use !process <process> 7 to show the kernel stacks for each thread in the process. This output can identify the problem in kernel mode and reveal what the suspect process is calling.
In addition to !process, the following extensions can help to determine the cause of an unresponsive computer:
Extension | Effect |
---|---|
Identifies the threads that are ready to run, in order of priority. |
|
Identifies any held resource locks, in case there is a deadlock with retail time outs. |
|
Checks the virtual memory usage. |
|
Determines whether one type of pool allocation is disproportionately large (pool tagging required). |
|
Checks the physical memory status. |
|
Checks the validity of the heap. |
|
Searches nonpaged pool for active IRPs. |
If the information provided does not indicate an unusual condition, try setting a breakpoint at ntoskrnl!KiSwapThread to determine whether the processor is stuck in one process or if it is still scheduling other processes. If it is not stuck, set breakpoints in common functions, such as NtReadFile, to determine whether the computer is stuck in a specific code path.