Part 2: Got Stack? No. We ran out and kv won’t tell me why!
Hello. It’s Ryan again with the second installment of my stack depletion walkthrough. Part 1 of this blog covered the initial analysis of a kernel memory dump captured due to a Stop 0x7f EXCEPTION_DOUBLE_FAULT. Our initial analysis revealed that kv was not able to provide us with a useful stack backtrace. Background information relating to Task States and Double Faults were also covered. If you haven’t yet reviewed this blog, you can find it at Part 1.
Previously, in part one of this blog, we reviewed the memory dump of a stop 0x7f EXCEPTION_DOUBLE_FAULT and found that we were not provided with a valid stack backtrace. Without a valid stack backtrace, we were unable to identify what depleted the stack. Whenever I am in the situation where kv is not able to properly walk the stack, my next course of action is to manually dump out the memory within the stack range using the dps command.
In the previous blog, we ran the !thread command to obtain the stack base and limit to view the thread in our fictiously named process, StackHog.exe.
!thread
Owning Process 874c6800 Image: StackHog.exe
...
Base b8ae9000 Limit b8ae6000
Let’s pass this address range to the dps command. The dps command will display the contents of memory and treat the data dumped out as pointers. It will then try to match up these pointer addresses with symbol information in the same manner that kv does.
dps b8ae6000 b8ae9000
.
.
.
omitting extremely long output
I'll spare the lengthy dps output here and instead describe what was observed. When dumping the stack in this manner, I noticed that one product’s drivers were appearing on the stack over and over again. This product also provided the executable that was running as the current process (I obtained this information from the !thread command that I ran to obtain the limit and base values). Running lm kv m and !lmi against these drivers verified that they were in fact all from the same product (StackHog.exe).
Using dps in this manner will often provide a good idea of what may have been using the stack. However, there are a few problems that may affect the reliability of this method. Dps simply dumps out whatever data is present anywhere on the stack. One problem is there can often be trash left over on the stack from previous activity that is unrelated to our present stack. Also, there may be trap handlers and other data that shows up to further complicate the call flow picture. In addition, the information will not clearly show the order of calls like a kv stack backtrace output will. The output can also be very lengthy since the entire range is displayed. On a stack where each call frame averages 8 frames, you may be only interested in the symbols from the return address in each frame. I have observed some stack frames where dps may resolve two or three symbols for that frame. All of this means that while dps is a good tool and often useful, on occasion some of the output from dps may be confusing instead of revealing. Sometimes when multiple unrelated code from various vendors (including Microsoft) are on the stack, you may need a more complete understanding of the true call flow. For these reasons, whenever possible, I want to be able to see the kv stack backtrace to display the call flow that led up to the crash.
Let's see if we can help kv to reconstruct the stack. Since the stack backtrace did not display properly on its own, we will need to help out a little to get the backtrace started. If we can provide good starting point values, the debugger can often dump out the rest of the stack. Also, when dealing with an overflowed stack, we will want to use the kf option instead of kv. The kf command will give us a better idea of how much space a driver and the calls it made are occupying on the stack. It does this by listing the amount of stack space between the stack frames. If you take these stack usage values as fact, you are assuming that the stack was backtraced correctly. Sometimes this isn’t the case when you don’t have symbols for all the code on the stack. So to investigate the stack usage, we need to get a good backtrace. The k command accepts parameters that will help it display the stack using address values that you provide. We need to provide a few addresses to any of the various versions of the k command by using the equal sign as I will demonstrate shortly. We need to provide the BasePtr, the StackPtr, and the InstructionPtr. For more information on the k command, refer to the msdn documentation .
Since the real problem here is that various modules have used up all the stack, we simply need to dump out most of that stack to see where most of the usage is. It isn't important that we identify what was going on when we died or what was at the very top of the stack. We just want to see as much of the stack as we can dumped out in an easy to understand format by using kf.
So where can we find these values? I'll start by dumping out the stack using the address of the stack limit. Let's dump out this area of memory. I'll start by clearing the screen so that I can examine the output
3: kd>.cls
3: kd> dps b8ae6000 b8ae9000
<omitting output>
I wasn’t able to locate any valid patterns in the very top of the stack. This was probably due to lack of symbols and valid stack related register addresses combined with FPO or other optimizations. Things started making sense further down the stack in the stack range that I have listed below. Observe the patterns I have highlighted. Note that the symbol output is listed next to the return addresses. I will try to display my stack starting at this point. B8ae6100 is very close to the stack limit so we won’t be missing much of the output. We will get most of the stack output that we need to see if we start here.
b8ae60ec b8ae6100 <----------------------------------points to the next stack frame pointer (saved ebp) below
b8ae60f0 8081df65 nt!IofCallDriver+0x45 <---possible Return address. Start here.
b8ae60f4 8763f718
b8ae60f8 87758bd8
b8ae60fc 8b4abb00
b8ae6100 b8ae6128 This points to the next frame
b8ae6104 f7a2ec45 fltmgr!FltpDispatch+0x6f <----possible return address
b8ae6108 8b4abb00
b8ae610c 87758bd8
b8ae6110 00000000
b8ae6114 89751350
b8ae6118 00000080
b8ae611c 00000000
b8ae6120 b8ae6130
b8ae6124 8084cff9 nt!MmIsAddressValid+0xf
b8ae6128 b8ae613c
b8ae612c 8081df65 nt!IofCallDriver+0x45
b8ae6130 8765b6a8
b8ae6134 87758bd8
b8ae6138 89751350
b8ae613c b8ae6144
b8ae6140 b958e196 BossHog+0x1196
b8ae6144 b8ae61c4
b8ae6148 b958f4bc BossHog!StackEater+0x80c
I'll start by identifying all of the stack address in the hopes of finding any candidates for saved stack frame pointers (saved ebp values). Based on the limit and base, these addresses will all start with b8ae. The next digit will be 6, 7, 8, or 9. I have identified all of the possible values. Next, I looked for patterns of addresses that are pointing to other pointers below them to create a chain. Next, I will look for possible return addresses. They should appear on the line right under the saved EBP value.
I'll try dumping the stack using the values from the first possible frame. The format is
kf=BasePtr, StackPtr, InstructionPtr
So now, it appears that we have a pattern, let me pass in these numbers to the kf command. If this works, then the stack walker will dump the stack out from this point down up to the maximum that you have set by using the .kframes command. I’ll start by raising the number of stack frames displayed using .kframes.
3: kd> .kframes 200
Default stack trace depth is 0n512 frames
Now, lets dump the stack out using kf. Kf will display the number of bytes of stack space used in hex on each line before it displays the frame.
3: kd> kf=b8ae6100 b8ae60ec 8081df65
Memory ChildEBP RetAddr
b8ae6100 f7a2ec45 nt!IofCallDriver+0x45
28 b8ae6128 8081df65 fltmgr!FltpDispatch+0x6f
14 b8ae613c b958e196 nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
8 b8ae6144 b958f4bc BabyHog+0x1196
80 b8ae61c4 8081df65 BabyHog!HogFarm+0x88c
14 b8ae61d8 b80c18a6 nt!IofCallDriver+0x45
84 b8ae625c b80cf367 BossHog+0x78a6
10 b8ae626c b80cf3b7 BossHog+0x15367
28 b8ae6294 8081df65 BossHog!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae62a8 b76fbcf7 nt!IofCallDriver+0x45
84 b8ae632c b7709ae6 BossHog01+0x7cf7
10 b8ae633c b7709b36 BossHog01+0x15ae6
28 b8ae6364 8081df65 BossHog01!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae6378 8081e4ed nt!IofCallDriver+0x45
18 b8ae6390 8085114a nt!IoPageRead+0x109
9c b8ae642c 8085ea66 nt!MiDispatchFault+0xece
84 b8ae64b0 8088c798 nt!MmAccessFault+0x89e
0 b8ae64b0 808b64a6 nt!_KiTrap0E+0xdc
c8 b8ae6578 bae5af2d nt!CcMapData+0x8c
20 b8ae6598 bae5849b Ntfs!NtfsMapStream+0x4b
74 b8ae660c bae5adf0 Ntfs!NtfsReadMftRecord+0x86
38 b8ae6644 bae5afac Ntfs!NtfsReadFileRecord+0x7a
38 b8ae667c bae19903 Ntfs!NtfsLookupInFileRecord+0x37
110 b8ae678c bae1a6c4 Ntfs!NtfsLookupAllocation+0xdd
1d0 b8ae695c bae1a87c Ntfs!NtfsPrepareBuffers+0x25d
1dc b8ae6b38 bae1b1a6 Ntfs!NtfsNonCachedIo+0x1ee
ec b8ae6c24 bae1b0c9 Ntfs!NtfsCommonRead+0xaf5
1ac b8ae6dd0 8081df65 Ntfs!NtfsFsdRead+0x113
14 b8ae6de4 f7a2ec45 nt!IofCallDriver+0x45
28 b8ae6e0c 8081df65 fltmgr!FltpDispatch+0x6f
14 b8ae6e20 b958e196 nt!IofCallDriver+0x45
8 b8ae6e28 b958f4bc BabyHog+0x1196
80 b8ae6ea8 8081df65 BabyHog!HogFarm+0x88c
14 b8ae6ebc b80c18a6 nt!IofCallDriver+0x45
84 b8ae6f40 b80cf367 BossHog+0x78a6
10 b8ae6f50 b80cf3b7 BossHog+0x15367
28 b8ae6f78 8081df65 BossHog!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae6f8c b76fbcf7 nt!IofCallDriver+0x45
84 b8ae7010 b7709ae6 BossHog01+0x7cf7
10 b8ae7020 b7709b36 BossHog01+0x15ae6
28 b8ae7048 8081df65 BossHog01!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae705c 8081e4ed nt!IofCallDriver+0x45
18 b8ae7074 8085114a nt!IoPageRead+0x109
9c b8ae7110 8085ea66 nt!MiDispatchFault+0xece
84 b8ae7194 8088c798 nt!MmAccessFault+0x89e
0 b8ae7194 808b64a6 nt!_KiTrap0E+0xdc
c8 b8ae725c bae5af2d nt!CcMapData+0x8c
20 b8ae727c bae5d9d5 Ntfs!NtfsMapStream+0x4b
30 b8ae72ac bae5f5e4 Ntfs!ReadIndexBuffer+0x8f
174 b8ae7420 bae5f786 Ntfs!NtfsUpdateFileNameInIndex+0x62
fc b8ae751c bae5f8c6 Ntfs!NtfsUpdateDuplicateInfo+0x2b0
208 b8ae7724 bae5c8d9 Ntfs!NtfsCommonCleanup+0x1e82
170 b8ae7894 8081df65 Ntfs!NtfsFsdCleanup+0xcf
14 b8ae78a8 f7a2ec45 nt!IofCallDriver+0x45
28 b8ae78d0 8081df65 fltmgr!FltpDispatch+0x6f
14 b8ae78e4 b958e196 nt!IofCallDriver+0x45
8 b8ae78ec b958f472 BabyHog+0x1196
80 b8ae796c 8081df65 BabyHog!HogFarm+0x842
14 b8ae7980 b80c18a6 nt!IofCallDriver+0x45
84 b8ae7a04 b80cf367 BossHog+0x78a6
10 b8ae7a14 b80cf3b7 BossHog+0x15367
28 b8ae7a3c 8081df65 BossHog!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae7a50 b770a8ac nt!IofCallDriver+0x45
28 b8ae7a78 b76fb994 BossHog01!DEVICEDISPATCH::LowerDevicePassThrough+0x48
7c b8ae7af4 b76fbbaf BossHog01+0x7994
94 b8ae7b88 b7709ae6 BossHog01+0x7baf
10 b8ae7b98 b7709b36 BossHog01+0x15ae6
28 b8ae7bc0 8081df65 BossHog01!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae7bd4 808f9732 nt!IofCallDriver+0x45
30 b8ae7c04 80934bac nt!IopCloseFile+0x2ae
30 b8ae7c34 809344ad nt!ObpDecrementHandleCount+0xcc
28 b8ae7c5c 80934546 nt!ObpCloseHandleTableEntry+0x131
44 b8ae7ca0 80934663 nt!ObpCloseHandle+0x82
10 b8ae7cb0 8088978c nt!NtClose+0x1b
0 b8ae7cb0 8082e811 nt!KiFastCallEntry+0xfc
7c b8ae7d2c b8d8ec2d nt!ZwClose+0x11
50 b8ae7d7c b8d8ede5 MamaHog+0x5c2d
54 b8ae7dd0 b8d8fa85 MamaHog+0x5de5
164 b8ae7f34 b8d917fe MamaHog+0x6a85
40 b8ae7f74 b8d8d22a MamaHog+0x87fe
2c4 b8ae8238 b958ecdf MamaHog+0x422a
24 b8ae825c b958eee0 BabyHog!HogFarm+0xaf
34 b8ae8290 8081e103 BabyHog!HogFarm+0x2b0
30 b8ae82c0 bae1a22c nt!IopfCompleteRequest+0xcd
10 b8ae82d0 bae5c00a Ntfs!NtfsCompleteRequest+0xc8
104 b8ae83d4 8081df65 Ntfs!NtfsFsdCreate+0x48c
14 b8ae83e8 f7a3c458 nt!IofCallDriver+0x45
2c b8ae8414 8081df65 fltmgr!FltpCreate+0xe4
14 b8ae8428 b958e196 nt!IofCallDriver+0x45
8 b8ae8430 b958f71c BabyHog+0x1196
60 b8ae8490 8081df65 BabyHog!HogFarm+0xaec
14 b8ae84a4 b80d012b nt!IofCallDriver+0x45
28 b8ae84cc b80c1862 BossHog!DEVICEDISPATCH::LowerDevicePassThrough+0x48
8c b8ae8558 b80cf367 BossHog+0x7862
10 b8ae8568 b80cf3b7 BossHog+0x15367
28 b8ae8590 8081df65 BossHog!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae85a4 b76f9639 nt!IofCallDriver+0x45
4c b8ae85f0 b76fbb42 BossHog01+0x5639
94 b8ae8684 b7709ae6 BossHog01+0x7b42
10 b8ae8694 b7709b36 BossHog01+0x15ae6
28 b8ae86bc 8081df65 BossHog01!DEVICEDISPATCH::DispatchPassThrough+0x48
14 b8ae86d0 808f8f71 nt!IofCallDriver+0x45
e8 b8ae87b8 80937942 nt!IopParseDevice+0xa35
80 b8ae8838 80933a76 nt!ObpLookupObjectName+0x5b0
54 b8ae888c 808eae25 nt!ObOpenObjectByName+0xea
7c b8ae8908 808ec136 nt!IopCreateFile+0x447
48 b8ae8950 b76ff4ae nt!IoCreateFileSpecifyDeviceObjectHint+0x52
9c b8ae89ec b76ff0e6 BossHog01+0xb4ae
58 b8ae8a44 b7685f8b BossHog01+0xb0e6
50 b8ae8a94 b76867a3 daddyHog+0x3f8b
28 b8ae8abc b76f66e7 daddyHog+0x47a3
38 b8ae8af4 b76f6e76 BossHog01+0x26e7
124 b8ae8c18 b770a0d5 BossHog01+0x2e76
44 b8ae8c5c 808f5e2f BossHog01!DEVICEDISPATCH::DispatchPassThrough+0x5e7
a4 b8ae8d00 808eed08 nt!IopXxxControlFile+0x255
34 b8ae8d34 8088978c nt!NtDeviceIoControlFile+0x2a
0 b8ae8d34 7c8285ec nt!KiFastCallEntry+0xfc
0335e534 00000000 0x7c8285ec
Problem solved, we have our stack. Please note that the technique demonstrated above is only applicable to an x86 platform. The x64 architecture is completely different. For more information, please refer to Trey Nash’s previous blog
The same drivers I observed previously using the dps output are listed; however, it’s much easier to follow the call flow this time. Let’s see how much stack these hogs are using:
3: kd> ? 0x8+0x80+0x84+0x10+0x28+0x84+0x10+0x28+0x8+0x80+0x84+0x10+0x28+0x84+0x10+0x8+0x80+0x84+0x10+0x28+0x28+0x7c+0x94+0x10+0x28+0x50+0x54+0x164+0x40+0x2c4+0x24+0x34+0x8+0x60+0x28+0x8c+0x10+0x28+0x9c+0x58+0x50+0x28+0x38+0x124+0x44
Evaluate expression: 4176 = 00001050
How many kb is this?
3: kd> ? 4176/1024
Evaluate expression: 4 = 00000004
I would also like to dump out one of the functions that is using a large amount of stack space to show you how you can dig deeper into problems like this. Sometimes this level of granularity is needed when a vendor is attempting to optimize their code by showing them what caused their large allocations to occur.
Let’s review the frame listed below that used 0x2c4 bytes of stack space
40 b8ae7f74 b8d8d22a MamaHog+0x87fe
2c4 b8ae8238 b958ecdf MamaHog+0x422a
24 b8ae825c b958eee0 BabyHog!HogFarm+0xaf
I have underlined the return address of this stack frame. This is the address right after the instruction where the call to MamaHog took place. After MamaHog would have completed, execution would of course have continued with the line of code right after the call. To view the call to MamaHog, let’s unassemble backwards one instruction to see the call being made.
3: kd> ub b958ecdf L1
BabyHog!HogFarm+0xa8:
call dword ptr BabyHog!HogFarm +0x7460 (b9596090 ) <-stores the location we called
Let’s dump out this location to see what we actually called. The code dereferences this location to make the actual call.
3: kd> dps b9596090 L1
b9596090 b8d8c0d0 BabyHog +0x30d0<-----highlighted address is the function called
3: kd> uf b8d8c0d0
b8d8c0d0 55 push ebp
b8d8c0d1 8bec mov ebp,esp
b8d8c0d3 6aff push 0FFFFFFFFh
b8d8c0d5 6840e2d9b8 push offset BabyHog +0x15240 (b8d9e240)
b8d8c0da 6808b2d9b8 push offset BabyHog +0x12208 (b8d9b208)
b8d8c0df 64a100000000 mov eax,dword ptr fs:[00000000h]
b8d8c0e5 50 push eax
b8d8c0e6 64892500000000 mov dword ptr fs:[0],esp
b8d8c0ed 81c470fdffff add esp,0FFFFFD70h
To subtract 656, we are adding -656 to esp to bump the stack up. This is where the compiler is allocating stack space for storage of local variables. I am guessing there were multiple structures being allocated directly on the stack. If the programmer had instead called ExAllocatePoolWithTag to obtain memory, we could have instead only stored the pointers to this memory on the stack. Simply using the stack for storage is of course faster than calling out to get memory. The developer should balance the performance needs of the code with the need to conserve the stack space which is a limited resource.
3: kd> .formats 0x0FFFFFD70
Evaluate expression:
Hex: fffffd70
Decimal: -656
So this one instruction raised the stack more than .5 kb all by itself. The function made a bunch of other pushes (some of which are listed above) which when combined with the return address pushed by the call resulted in 708 (0x2c4 was listed by kf above) bytes of stack space to be allocated by this one function.
3: kd> .formats 2c4
Evaluate expression:
Hex: 000002c4
Decimal: 708
Two allocations like this will use up almost 1.5kb of the 12k stack space. By itself, this isn’t necessarily an issue; however, you can see from the output above how liberal stack usage by multiple calls from stacked up drivers can quickly add up. So what have we discovered here? Without even considering the fact that this products drivers may also be responsible for some of the stack space used by calls that they may have initiated, they have used at least 4k of the 12k stack. Also, the full 12k of stack space is not available for driver use. This is because the operating system also requires stack space for such overhead as the I/O operation, the file system components, thread startup, etc. The customer removed the product which prevented further bugchecks while the vendor was being engaged to assist or provide leaner hogs.
Summary
This two part blog has covered CPU task states, stack overflows, x86 stack reconstruction techniques, and examining functions to observe the stack allocations taking place. Hopefully, this blog will help you to understand what went wrong the next time that you encounter a Stop 0x7f (EXCEPTION_DOUBLE_FAULT).
You don’t have to wait until you encounter an UNEXPECTED_KERNEL_MODE_TRAP (7f) Arg1: 00000008, EXCEPTION_DOUBLE_FAULT to get familiar with this. Open up any dump file and see if you can find the limit and base. Then observe the values of your ebp and esp registers. If you are feeling truly geeky, dump the stack range out, identify patterns, and try passing various values to kf.
Please check out our previous blog post on this topic:
https://blogs.msdn.com/ntdebugging/archive/2008/02/01/kernel-stack-overflows.aspx
Keep in mind that this two part blog has only discussed stop 0x7f bugchecks where Arg1 lists value 0x8 EXCEPTION_DOUBLE_FAULT. There are other causes for a 0x7f bugcheck. For more information, please refer to:
314102 General Causes of STOP 0x0000007F Errors
Bug Check 0x7F: UNEXPECTED_KERNEL_MODE_TRAP
References:
“Windows Internals, Fifth Edition” By Mark E. Russinovich, David A. Solomon, with Alex Ionescu
Chapter 9, page 786, “Kernel Stacks”
Share this post : |
Comments
Anonymous
March 13, 2012
Trey Nash’s previous blog - link has "." @ end [Thanks Sam. We can't edit these older posts without breaking the formatting. Hopefully your comment will help the next reader.]Anonymous
June 23, 2014
This seems like a very useful tutorial in debugging kernel stack overflows, thanks. I am currently trying to debug a kernel stack overflow which happens during an interrupt and that appears to complicate things. In my case the top of the stack is correctly interpreted by windbg, but the kernel frames shown at the top of the stack only take up a fraction of the available 12k. The rest of the stack, below that, is nonsense. If I dump the raw stack I can see some user mode functions mentioned, so i wonder whether the 'nonsense' part of the stack I get with kcf/kb is the user stack (because when an interrupt occurs while running user mode the stack is switched from user stack to kernel stack). But even if that is the case, why would the kernel stack be almost full when the interrupt occurs. Any tips gratefully received! [User mode code won't use kernel stack so it will not contribute to this problem. The limitations on the size of kernel stack only impact kernel code. Sometimes optimized code without symbols can break the debugger's stack walking, requiring manual reconstruction of the stack as demonstrated in this article.]