Quick tips: Kernel Dumps, Blue Screens and !Analyze -v
Hello,
This time I’m going to address something that most times is somewhat straight forward to analyze yet many people I deal with don´t know how to proceed when a blue screen appears. In this blog post I assume the server is already configured to generate kernel dumps or mini dumps. (this is something I always advice. Configure your servers to generate memory dumps if something goes wrong)
Usually I get emails like “my server just got a BSOD (blue screen of death). I´ve got a minidump and I need to understand what happened”. My approach is always the same (and most of the times it is enough to find the root cause).
First Step
Open windbg and make sure the symbol server is properly configured – more info at https://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx
Second Step
Open the memory dump on windbg. Below is the output when opening a kernel memory dump
(…)
Loading Kernel Symbols
...............................................................
................................................................
......................
Loading User Symbols
Loading unloaded module list
....................................
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 50, {d44cdde4, 0, 818ed8b3, 0}
(…)
As you can see above the debuggers states “Use !analyze –v to get detailed debugging information”. Let’s follow the expert J (the debugger) and issue !analyze –v
1: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: d44cdde4, memory referenced.
Arg2: 00000000, value 0 = read operation, 1 = write operation.
Arg3: 818ed8b3, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 00000000, (reserved)
Debugging Details:
------------------
(…)
TRAP_FRAME: d28db9e0 -- (.trap 0xffffffffd28db9e0)
ErrCode = 00000000
eax=d44cdde0 ebx=d28dbad0 ecx=d28dba94 edx=00000000 esi=b2a6c298 edi=09f56e20
eip=818ed8b3 esp=d28dba54 ebp=d28dba60 iopl=0 nv up ei ng nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
nt!RtlTimeToSecondsSince1980+0x16:
818ed8b3 ff7004 push dword ptr [eax+4] ds:0023:d44cdde4=????????
Resetting default scope
LAST_CONTROL_TRANSFER: from 8185edb4 to 818a936d
STACK_TEXT:
d28db9c8 8185edb4 00000000 d44cdde4 00000000 nt!MmAccessFault+0x10a
d28db9c8 818ed8b3 00000000 d44cdde4 00000000 nt!KiTrap0E+0xdc
d28dba60 a658794a d44cdde0 d28dba94 b2a6c298 nt!RtlTimeToSecondsSince1980+0x16
d28dba9c a6586d85 bfca9448 b2a6c298 00000020 srvnet!FillSessionInfoBuffer+0x8c
d28dbae4 a6587be7 bfca9448 00000007 00000006 srvnet!SvcEnumApiHandler64+0x7f
d28dbb10 a657bb5a bfca9448 09f56d60 00002000 srvnet!SvcSessionEnum+0x2f
d28dbb6c a658c102 87629550 00000001 10b017e8 srvnet!SrvAdminProcessFsctl+0x2de
d28dbbd0 a657c3aa 87629550 00000001 10b017e8 srvnet!SrvNetProcessFsctl+0x54
d28dbc18 a658c043 872aaf08 00146027 87629550 srvnet!SrvNetDeviceControl+0xc6
d28dbc2c 81855976 872aaf08 c5e69880 c5e69880 srvnet!SrvNetDefaultDispatch+0x3e
d28dbc44 81a576a1 87629550 c5e69880 c5e698f0 nt!IofCallDriver+0x63
d28dbc64 81a57e46 872aaf08 87629550 10b01701 nt!IopSynchronousServiceTail+0x1d9
d28dbd00 81a56b2c 872aaf08 c5e69880 00000000 nt!IopXxxControlFile+0x6b7
d28dbd34 8185bc7a 000007a8 000033a4 00000000 nt!NtFsControlFile+0x2a
d28dbd34 77275e74 000007a8 000033a4 00000000 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
043bf878 00000000 00000000 00000000 00000000 0x77275e74
(…)
MODULE_NAME: srvnet
IMAGE_NAME: srvnet.sys
(…)
I usually look at the highlighted above:
· Error – PAGE_FAULT_IN_NONPAGED_AREA
· Stack – STACK_TEXT
· IMAGE_NAME
Next step is to see what is the version of the module (in my sample srvnet)
1: kd> lmvm srvnet
(…)
FileVersion: 6.0.6002.18005
(…)
And finally bing it (https://www.bing.com/search?q=%22PAGE_FAULT_IN_NONPAGED_AREA%22+srvnet+msdn&go=&form=QBRE&filt=all) with some keywords (depending on results I try one or more combinations like including method name, …)
"PAGE_FAULT_IN_NONPAGED_AREA" srvnet msdn
In my case the first link is to https://support.microsoft.com/kb/951418. After installing the hotfix the issue no longer occurs.
Of course it´s not always as simple as this but most times it’s enough.
See you next time
Bruno