Win2012R2: STOP 0x133 and STOP 0x9E caused by too many NBLs in a DPC
Status: in progress...
Hi All, recently we have been seeing an increase in STOP 0x133 (on non-clustered) and 0x9E (on clustered) cases, where the root cause is that there are too many NBLs indicated in a DPC. This causes the DPC watchdog timer to expire, resulting in mentioned STOP errors. Part of an example stack is
1f ffffd001`1d4f2800 fffff800`d9cfa347 NDIS!NdisSendNetBufferLists+0x551
20 ffffd001`1d4f29f0 fffff800`d9cf9e14 vmswitch!VmsExtPtRouteNetBufferLists+0x377
21 ffffd001`1d4f2ac0 fffff800`d845b903 vmswitch!VmsPtNicReceiveNetBufferLists+0x3c4
22 ffffd001`1d4f2c20 fffff800`d846b29b NDIS!ndisMIndicateNetBufferListsToOpen+0x123
23 ffffd001`1d4f2ce0 fffff800`d845c8f6 NDIS!ndisMTopReceiveNetBufferLists+0x2db
24 ffffd001`1d4f2d70 fffff800`d81df7f0 NDIS!NdisMIndicateReceiveNetBufferLists+0xb96
25 ffffd001`1d4f2f50 fffff800`d81df21a NdisImPlatform!implatTryToIndicateReceiveNBLs+0x1e8
26 ffffd001`1d4f2fc0 fffff800`d845b903 NdisImPlatform!implatReceiveNetBufferLists+0x1a2
27 ffffd001`1d4f3040 fffff800`d845c6dd NDIS!ndisMIndicateNetBufferListsToOpen+0x123
28 ffffd001`1d4f3100 fffff800`d89e9eb7 NDIS!NdisMIndicateReceiveNetBufferLists+0x97d
At present, there is no definitive solution available, but there are some mitigating actions available:
1. Optimize RSS/VMQ configuration. A NIC miniport indicating a >256 NBL chain per CPU indicates RSS/VMQ is not correctly used. See https://blogs.technet.com/b/networking/archive/2013/09/10/vmq-deep-dive-1-of-3.aspx.
2. Use a Receive Side Throttling (RST) capable NIC.
3 Reduce the number of Receive Buffers for the miniport indicating the high amount of NBLs. Note that this setting is not available for Hyper-V vNICs. Also, by setting this a performance limitation might be introduced. Monitor your network performance after changing this value.
Also ensure that NIC driver/firmware are up-to-date. Additional information: https://msdn.microsoft.com/en-us/library/windows/hardware/ff567241(v=vs.85).aspx.
When you think you are hitting this scenario, please reach out to us.