Another Who Done It
Hi my name is Bob Golding, I am an EE in GES. I want to share an interesting problem I recently worked on. The initial symptom was the system bugchecked with a Stop 0xA which means there was an invalid memory reference. The cause of the crash was a driver making I/O requests while Asynchronous Procedure Calls (APCs) were disabled. The bugcheck caused by an invalid memory reference was the result of the problem and not the cause.
An APC is queued to a thread during I/O completion. This is to guarantee the last phase of the I/O completion occurs in the same context as the process that issued the request.
The stack of the trap is presented below. The call stack shows that APCs are being enabled allowing queued APCs to run.
Child-SP RetAddr Call Site
fffff880`07bf3598 fffff800`030b85a9 nt!KeBugCheckEx
fffff880`07bf35a0 fffff800`030b7220 nt!KiBugCheckDispatch+0x69
fffff880`07bf36e0 fffff800`030d8b56 nt!KiPageFault+0x260
fffff880`07bf3870 fffff800`030959ff nt!IopCompleteRequest+0xc73
fffff880`07bf3940 fffff800`0306c0d9 nt!KiDeliverApc+0x1d7
fffff880`07bf39c0 fffff800`033f8a1a nt!KiCheckForKernelApcDelivery+0x25
fffff880`07bf39f0 fffff800`033cce2f nt!MiMapViewOfSection+0x2bafa
fffff880`07bf3ae0 fffff800`030b8293 nt!NtMapViewOfSection+0x2be
fffff880`07bf3bb0 00000000`772df93a nt!KiSystemServiceCopyEnd+0x13
00000000`0015dea8 00000000`00000000 0x772df93a
The reason the trap occurred is because when issuing requests to lower drivers it is common practice in drivers to implement code similar to:
…
KEVENT event;
status = IoCallDriver( DeviceObject, irp );
//
// Wait for the event to be signaled if STATUS_PENDING is returned.
//
if (status == STATUS_PENDING) {
(VOID)KeWaitForSingleObject( &event, // event is a local which is declared on the stack
Executive,
KernelMode,
FALSE,
NULL );
}
…
As you can see in the above code, if the return from IoCallDriver does not return pending the code continues and exits. Part of the last phase of I/O processing that takes place in the APC is signaling the event. If the call to IoCallDriver returns success, because the event is on the stack it is critical that the APC execute immediately before the stack unwinds. Since APCs where disabled, the execution of the APC was delayed and during this time the event became invalid. The APCs were delayed because the memory manager was in a critical area and APCs could not run.
I needed to determine which driver did this so I enabled IRP logging in Driver Verifier to trace I/O requests. With this enabled the next dump should contain a transaction log that will help identify what driver is performing I/O while APCs are disabled. The command line to enable this is:
verifier /flags 0x410 /all
The new dump with verifier enabled also crashed after delivering an APC to the thread and completing the IRP. From the debug output below I can find the IRPs that were issued and the thread that issued them, this is what I need to look for them in the log.
1: kd> !thread
THREAD fffffa80064c9b50 Cid 0200.0204 Teb: 000007fffffde000 Win32Thread: 0000000000000000 RUNNING on processor 1
IRP List:
fffff9800a33ec60: (0006,03a0) Flags: 40060070 Mdl: 00000000
fffff9800a250c60: (0006,03a0) Flags: 40060070 Mdl: 00000000
fffff9800a3f4ee0: (0006,0118) Flags: 40060070 Mdl: 00000000
Not impersonating
DeviceMap fffff8a000007890
Owning Process fffffa80064bbb30 Image: csrss.exe
Attached Process N/A Image: N/A
Wait Start TickCount 1656 Ticks: 0
Context Switch Count 25 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x000000004a061540
Stack Init fffff88003b21c70 Current fffff88003b20890
Base fffff88003b22000 Limit fffff88003b1c000 Call 0
Priority 14 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffff880`03b21428 fffff800`0307a54c nt!KeBugCheckEx
fffff880`03b21430 fffff800`030d02ee nt!MmAccessFault+0xffffffff`fff9c15c
fffff880`03b21590 fffff800`030c8db9 nt!KiPageFault+0x16e
fffff880`03b21728 fffff800`030e6ab3 nt!memcpy+0x229
fffff880`03b21730 fffff800`030c4bd7 nt!IopCompleteRequest+0x5a3
fffff880`03b21800 fffff800`0307ba85 nt!KiDeliverApc+0x1c7
fffff880`03b21880 fffff800`0331d96a nt!KiCheckForKernelApcDelivery+0x25
fffff880`03b218b0 fffff800`033e742e nt!MiMapViewOfSection+0xffffffff`fff36baa
fffff880`03b219a0 fffff800`030d1453 nt!NtMapViewOfSection+0x2bd
fffff880`03b21a70 00000000`7761159a nt!KiSystemServiceCopyEnd+0x13
00000000`0025f078 00000000`00000000 0x7761159a
The command “!verifier 100” will dump the transaction log. Below is the relevant portion of the log containing the IRPs for our thread.
IRP fffff9800a3f4ee0, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -4, SpecialApcDisable = -1
fffff80003573a68 nt!IovAllocateIrp+0x28
fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32
fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e
fffff880010f8bcc fltmgr!FltGetVolumeGuidName+0x18c
fffff88004e4fbe1 baddriver+0x12be1
fffff88004e73523 baddriver +0x36523
fffff88004e7300c baddriver +0x3600c
fffff88004e72cce baddriver +0x35cce
fffff88004e5f715 baddriver +0x22715
fffff88004e4c6c7 baddriver +0xf6c7
fffff88004e48342 baddriver +0xb342
fffff88004e5e44e baddriver +0x2144e
fffff88004e5e638 baddriver +0x21638
IRP fffff9800a250c60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1
fffff80003573a68 nt!IovAllocateIrp+0x28
fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32
fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e
fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73
fffff88001010a6b mountmgr!QueryDeviceInformation+0x207
fffff8800101986b mountmgr!QueryPointsFromMemory+0x57
fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a
fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9
fffff80003574c16 nt!IovCallDriver+0x566
fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac
fffff88004e4fbe1 baddriver +0x12be1
fffff88004e73523 baddriver +0x36523
fffff88004e7300c baddriver +0x3600c
IRP fffff9800a33ec60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1
fffff80003573a68 nt!IovAllocateIrp+0x28
fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32
fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e
fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73
fffff88001010afd mountmgr!QueryDeviceInformation+0x299
fffff8800101986b mountmgr!QueryPointsFromMemory+0x57
fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a
fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9
fffff80003574c16 nt!IovCallDriver+0x566
fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac
fffff88004e4fbe1 baddriver +0x12be1
fffff88004e73523 baddriver +0x36523
fffff88004e7300c baddriver +0x3600c
From the IRP log in verifier I can see that baddriver.sys is calling FltGetVolumeGuidName while APCs are disabled. Further investigation found that baddriver.sys had registered a function for image load notification, and the memory manager has APCs disabled when it calls the image notification routine. The image notification routine in baddriver.sys called FltGetVolumeGuidName which issued the I/O. From the log output I see KernelApcDisable and SpecialApcDisable, the issue is SpecialApcDisable being –1. The I/O completion APCs are considered special APCs, so kernel APC disable would not affect them.
The solution was for the driver to check for APCs disabled before issuing the FltGetVolumeGuidName and not make this call if APCs are disabled.