NDIS Case Study 1 - NDIS Packet Double Completion
Hi, this is Anurag again. Here is a case study of an NDIS driver causing a problem due to double completion of a send packet.
A protocol driver allocates a NDIS packet and gives it to the miniport driver to be sent on the wire. A miniport driver is supposed to send or complete the packet, but miniport driver is supposed to complete the packet only once. The moment it tries to complete a packet twice (generally due to a bug in the miniport driver), the machine will bugcheck with a stack like the one shown below. The crash will happen either in a protocol driver or in the NDIS driver. The bugcheck would be the result of a trap on accessing an invalid memory in the packet.
Each time a packet gets completed the miniport stamps it as completed by putting a string “COM” just before the packet (somewhere in the packet stack). If a miniport driver tries to complete this packet again, the system would crash. The miniport driver which tried to complete the completed packet again is at fault and should be fixed.
When reviewing the bugcheck, !analyze –v will give you the stack.
1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000008, memory referenced
Arg2: d0000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f78521e0, address which referenced memory
Debugging Details:
------------------
READ_ADDRESS: 00000008
CURRENT_IRQL: 2
FAULTING_IP:
NDIS!ndisMSendCompleteX+71
f78521e0 8b7808 mov edi,dword ptr [eax+8]
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0xD1
PROCESS_NAME: System
TRAP_FRAME: f78b2e18 -- (.trap 0xfffffffff78b2e18)
ErrCode = 00000000
eax=00000000 ebx=902bbab0 ecx=00000002 edx=fffffffe esi=8f3aabf8 edi=901bd440
eip=f78521e0 esp=f78b2e8c ebp=f78b2ea0 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
NDIS!ndisMSendCompleteX+0x71:
f78521e0 8b7808 mov edi,dword ptr [eax+8] ds:0023:00000008=????????
Resetting default scope
LAST_CONTROL_TRANSFER: from f78521e0 to 8088c963
STACK_TEXT:
f78b2e18 f78521e0 badb0d00 fffffffe 00000000 nt!KiTrap0E+0x2a7
f78b2ea0 b9f47b5a 902bbab0 8f3aabf8 00000000 NDIS!ndisMSendCompleteX+0x71
f78b2ec0 f78521f9 8f411000 013aabf8 00000000 X_Miniport_Network_Teamming_Driver +0x7b5a
f78b2ee4 ba04a0c8 9029b5e8 8f3aabf8 00000000 NDIS!ndisMSendCompleteX+0x8a
f78b2f00 ba052744 904d8eb8 8f3aabf8 00000000 Y_Miniport_NIC_Driver +0x20c8
f78b2f3c ba050339 8f8c5000 00000000 f78b2f68 Y_Miniport_NIC_Driver +0xa744
f78b2f70 ba05065f 008c5000 8f8c5844 f78b2f8c Y_Miniport_NIC_Driver +0x8339
f78b2f80 ba0493f1 8f8c5000 f78b2f9c f783a0e8 Y_Miniport_NIC_Driver +0x865f
f78b2f8c f783a0e8 904d8eb8 8f8c58bc f78b2ff4 Y_Miniport_NIC_Driver +0x13f1
f78b2f9c 808320f0 8f8c58bc 8f8c5844 00000000 NDIS!ndisMDpcEx+0x1f
f78b2ff4 8088db27 b88fe454 00000000 00000000 nt!KiRetireDpcList+0xca
f78b2ff8 b88fe454 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x37
8088db27 00000000 0000000a 0083850f bb830000 0xb88fe454
1: kd> .trap 0xfffffffff78b2e18
ErrCode = 00000000
eax=00000000 ebx=902bbab0 ecx=00000002 edx=fffffffe esi=8f3aabf8 edi=901bd440
eip=f78521e0 esp=f78b2e8c ebp=f78b2ea0 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
NDIS!ndisMSendCompleteX+0x71:
f78521e0 8b7808 mov edi,dword ptr [eax+8] ds:0023:00000008=????????
Looking at the stack, it appears to be a NULL memory address being accessed that is causing problem. This could lead us to believe that this is a memory corruption problem. This is where we go off-track on the investigation.
The current NDIS packet status should be checked if you bugcheck on the NDIS stack and in NDIS!ndisMSendCompleteX.
Some basic debugging will display the NDIS_PACKET on the stack and in the ESI register (8f3aabf8).
Looking at the memory area before the packet shows the string “COM” which is the indication that it was marked as completed.
Here is how it looks.
For reference this is how a non-completed packet looks like
So for this issue the X_Miniport_Network_Teamming_Driver was at fault which tried to complete a completed packet. Fortunately the vendor for the driver was aware of the problem and was able to give the customer a fix.
For a 64 bit dump we do a “dc <Packet Address>-0x68” to look at the memory area before the packet. Below is an example.
BugCheck D1, {0, 2, 0, fffffade57895375}
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffffade57895375, address which referenced memory
0: kd> r
Last set context:
rax=0000000000000000 rbx=fffffade6f203000 rcx=0000000000000000
rdx=fffffade6f564740 rsi=fffffade6faa3548 rdi=0000000000000000
rip=fffffade57895375 rsp=fffff8000012b990 rbp=fffffade6f5647d0
r8=0000000000000000 r9=5bcdd13016000000 r10=5bcdd13016110009
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0010 ss=0018 ds=0002 es=0000 fs=0000 gs=0000 efl=00010202
tcpip!TCPSendComplete+0x124:
fffffade`57895375 488b1b mov rbx,qword ptr [rbx] ds:0002:fffffade`6f203000=fffffade6f52e010
0: kd> kvn
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr : Args to Child : Call Site
00 fffff800`0012b990 fffffade`578975a7 : 00000000`00000000 00000000`00000000 fffffade`6e8980b0 fffffade`6f564740 : tcpip!TCPSendComplete+0x124
01 fffff800`0012b9e0 fffffade`5789a89e : fffffade`6f1f1700 fffffade`00000000 00000000`00026200 fffffade`6e8980b0 : tcpip!IPSendComplete+0x477
02 fffff800`0012bab0 fffffade`5ae37237 : fffff800`0083f480 fffffade`6f887000 fffffade`6faab228 fffffade`6e8980b0 : tcpip!ARPSendComplete+0x14a
03 fffff800`0012baf0 fffffade`58443c15 : fffffade`6f887000 fffffade`6e8980b0 fffffade`6f887038 fffffade`6c31e3b0 : NDIS!ndisMSendCompleteX+0xda
04 fffff800`0012bb40 fffffade`5ae37237 : fffffade`6faa3010 00000000`00000002 fffffade`6fc16228 fffffade`6c31e3b0 : NDIS_IM_TEAM_DRIVER!po_SendDone+0x75
05 fffff800`0012bb80 fffffade`596f9a4c : fffffade`6faa3010 fffff800`0012bc50 00000000`00000000 00000000`00000000 : NDIS!ndisMSendCompleteX+0xda
06 fffff800`0012bbd0 fffffade`5af5e9cd : fffff800`0012bcb0 fffffade`6f52e010 fffffade`6f203000 00000000`00000006 : NDIS_MP_DRIVER!l2nd_indicate_tx+0xfc
07 fffff800`0012bc30 fffffade`5af64889 : fffffade`6f52e010 fffffade`6f5315c8 00000000`00000000 00000000`00000000 : NDIS_MP_DRIVER!um_bdrv_indicate_tx_done+0x121
08 fffff800`0012bc90 fffffade`5af64bab : fffffade`6f52e010 00000000`00000001 00000000`00000001 fffff800`011b2080 : NDIS_MP_DRIVER!service_tx_intr+0xa9
09 fffff800`0012bce0 fffffade`5af64d63 : fffffade`6f52e010 00000000`0005ffd4 00000000`00000000 0000023e`f26bf2bf : NDIS_MP_DRIVER!service_intr_mp+0xb3
0a fffff800`0012bd10 fffff800`010285a1 : 00000000`00000000 fffffade`5af64cb4 fffff800`0012bd68 00000000`00000018 : NDIS_MP_DRIVER!um_bdrv_dpc+0xaf
0b fffff800`0012bd40 fffff800`01067c10 : fffff800`011b0180 fffff800`011b0180 00000000`0005ffd4 fffff800`011b4500 : nt!KiRetireDpcList+0x150
0c fffff800`0012bdd0 fffff800`014141d1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x50
0d fffff800`0012be00 00000000`fffff800 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemStartup+0x1bf
Note that in this bugcheck we crash in the transport driver in the above stack. But the problem is that the packet (fffffade`6e8980b0) came from the NDIS driver NDIS_IM_TEAM_DRIVER.
0: kd> dc fffffade`6e8980b0 -0x68
fffffade`6e898048 00000000 00000000 4d4f4336 00000000 ........6COM....
fffffade`6e898058 00000000 00000000 00000000 00000000 ................
fffffade`6e898068 00000000 00000000 00000000 00000000 ................
fffffade`6e898078 00000000 00000000 00000000 00000000 ................
fffffade`6e898088 00000000 00000000 00000000 00000000 ................
fffffade`6e898098 00000000 00000000 00000000 00000000 ................
fffffade`6e8980a8 ffffffff ffffffff 6e898430 fffffade ........0..n....
fffffade`6e8980b8 6f564740 fffffade 6d9ee460 fffffade @GVo....`..m....
After reading this, I suspect you would have at least these two questions for me:
Q1: How did I find out that “COM” is use as an identifier for a completed packet?
A1: Well, mainly code review. This is just intended as a debug trick to verify that the packet is already completed.
Q2: Why did I use 0x38 (for 32 bit) or 0x68 (for 64 bit) as an offset to check the memory contents before the packet address?
A2: Again code review shows the field to be updated with string ‘COM” can seen going 0x38 bytes back from the packet start address.
NDIS double packet completion is a common issue that would crash the system. The above trick can certainly be used to isolate the culprit NDIS driver.
Comments
Anonymous
September 30, 2008
PingBack from http://www.easycoded.com/ndis-case-study-1-ndis-packet-double-completion/Anonymous
November 10, 2009
Awesome explanation. Well Done. I really appreciate the straight forward and clear (un-flowered analysis as well as the two Q&As (same questions I had). Please keep providing this kind of insight and feedback!Anonymous
March 02, 2011
Good explanation...Anonymous
March 02, 2011
Good article...