Lab: Win32 Application causing 100% CPU condition (Advanced)
This is lab for simulating a real 100% CPU condition being caused by a generic Win32 application when there are no symbols available for such application.
Let’s start supposing the situation is currently occurring. So, the first step is to confirm the symptoms, which we can easily do by using, for instance the task manager. As we see in the picture below we have a 50% utilization on a dual core machine which represents 100% of one CPU being utilized.
If we navigate to the process tab in the task manager we will be able to identify which process is causing the problem as shown in the picture below:
Ok, so it’s the MyApp.exe that is keeping one of our CPUs totally busy. It's time to bring the debugger and look what’s happening under the hood…
By using the Windbg from the Debugging Tools For Windows package, we can choose, by pressing F6, which process we want to attach the debugger to:
After attaching to the MyApp.exe process, the debugger will, by default, list all the modules loaded at the address space as well as it will break the execution through the break instruction exception (80000003). Now, since the execution is broken, our CPU is no longer busy and we can start investigating what is going on here.
Microsoft (R) Windows Debugger Version 6.6.0005.0 Copyright (c) Microsoft Corporation. All rights reserved. Executable search path is: ModLoad: 00400000 00446000 C:\Temp\MyApp.exe ModLoad: 7c900000 7c9b0000 C:\WINDOWS\system32\ntdll.dll ModLoad: 7c800000 7c8f4000 C:\WINDOWS\system32\kernel32.dll ModLoad: 77d40000 77dd0000 C:\WINDOWS\system32\USER32.dll ModLoad: 77f10000 77f57000 C:\WINDOWS\system32\GDI32.dll ModLoad: 76b40000 76b6d000 C:\WINDOWS\system32\WINMM.dll ModLoad: 77dd0000 77e6b000 C:\WINDOWS\system32\ADVAPI32.dll ModLoad: 77e70000 77f01000 C:\WINDOWS\system32\RPCRT4.dll ModLoad: 77c10000 77c68000 C:\WINDOWS\system32\msvcrt.dll ModLoad: 76c30000 76c5e000 C:\WINDOWS\system32\WINTRUST.dll ModLoad: 77a80000 77b14000 C:\WINDOWS\system32\CRYPT32.dll ModLoad: 77b20000 77b32000 C:\WINDOWS\system32\MSASN1.dll ModLoad: 76c90000 76cb8000 C:\WINDOWS\system32\IMAGEHLP.dll ModLoad: 72d20000 72d29000 C:\WINDOWS\system32\wdmaud.drv ModLoad: 72d10000 72d18000 C:\WINDOWS\system32\msacm32.drv ModLoad: 77be0000 77bf5000 C:\WINDOWS\system32\MSACM32.dll ModLoad: 77bd0000 77bd7000 C:\WINDOWS\system32\midimap.dll ModLoad: 5ad70000 5ada8000 C:\WINDOWS\system32\uxtheme.dll ModLoad: 74720000 7476b000 C:\WINDOWS\system32\MSCTF.dll ModLoad: 77120000 771ac000 C:\WINDOWS\system32\OLEAUT32.DLL ModLoad: 774e0000 7761d000 C:\WINDOWS\system32\ole32.dll ModLoad: 10000000 1001e000 C:\Program Files\Babylon\CAPTLIB.DLL ModLoad: 77f60000 77fd6000 C:\WINDOWS\system32\SHLWAPI.dll ModLoad: 74c80000 74cac000 C:\WINDOWS\system32\OLEACC.dll ModLoad: 76080000 760e5000 C:\WINDOWS\system32\MSVCP60.dll (8a8.c5c): Break instruction exception - code 80000003 (first chance) eax=7ffd6000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005 eip=7c901230 esp=00fcffcc ebp=00fcfff4 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246 ntdll!DbgBreakPoint: 7c901230 cc int 3 |
Let’s make sure you have the right symbol path so we can load at least the MS public symbols.
0:003> .sympath Symbol search path is: srv*c:\symbols*https://msdl.microsoft.com/download/symbols |
We have 4 threads as below:
0:003> ~ 0 Id: 8a8.da8 Suspend: 1 Teb: 7ffdf000 Unfrozen 1 Id: 8a8.f1c Suspend: 1 Teb: 7ffde000 Unfrozen 2 Id: 8a8.75c Suspend: 1 Teb: 7ffdd000 Unfrozen . 3 Id: 8a8.c5c Suspend: 1 Teb: 7ffdc000 Unfrozen |
Our current context is set to thread 03.
The more reliable way to find out which thread is consuming the processor time would be use the System Monitor tool with the right objects and counters (process and thread objects) as described in the article Troubleshooting IIS 100% CPU issues - Step by Step. However we will try something different here. We will use the extension command !uext.runaway or simply !runaway (since the extension uext.dll is loaded by default when debugging user mode apps from Windbg.exe):
0:003> !uext.runaway User Mode Time Thread Time 0:da8 0 days 0:03:36.703 3:c5c 0 days 0:00:00.000 2:75c 0 days 0:00:00.000 1:f1c 0 days 0:00:00.000 |
The !runaway extension command returns a list containing all the current threads and the processor time spent by them. As we see in the result above our thread 0 has to be the one we are interested in since it's not only the one with more time, but is also the only one with any processor time.
So let’s change the context to thread 0 in order to verify what has been executed by it:
0:003> ~0s *** WARNING: Unable to verify checksum for C:\Temp\MyApp.exe *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Temp\MyApp.exe - eax=00000000 ebx=00000000 ecx=0012fd7c edx=7c90eb94 esi=00402150 edi=0012fe40 eip=00401a73 esp=0012fd74 ebp=0012fd74 iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 MyApp!AboutDlgProc+0xa73: 00401a73 5d pop ebp |
There is a message warning us about the lack of symbols. Let's see what we can find about what this thread is doing even with no symbols for the application. Looking at the call stack we have the following:
0:000> kL ChildEBP RetAddr WARNING: Stack unwind information not available. Following frames may be wrong. 0012fd74 00401a69 MyApp!AboutDlgProc+0xa73 0012fd80 0040248f MyApp!AboutDlgProc+0xa69 0012fdd8 77d48734 MyApp!WndProc+0x33f 0012fe04 77d48816 USER32!InternalCallWinProc+0x28 0012fe6c 77d489cd USER32!UserCallWinProcCheckWow+0x150 0012fecc 77d496c7 USER32!DispatchMessageWorker+0x306 0012fedc 00401244 USER32!DispatchMessageA+0xf 0012ff30 00402cd3 MyApp!AboutDlgProc+0x244 0012ffc0 7c816fd7 MyApp!WndProc+0xb83 0012fff0 00000000 kernel32!BaseProcessStart+0x23 |
This just tells us, based on the first frame, that this is the first thread and also it has some MyApp.exe code being executed (we have a chain of at least 5 MyApp.exe functions in the stack). But with no symbols this won't help us any longer since the function names for MyApp.exe are not reliable. Time to look at the asm being executed...
The current instruction pointer (represented by the CPU register EIP) is pointing to this asm:
0:000> u eip MyApp!AboutDlgProc+0xa73: 00401a73 5d pop ebp 00401a74 c3 ret 00401a75 cc int 3 |
As we can see, it's just taking the top value on the stack and saving it to the CPU register EBP (pop EBP) and the next instruction is setting the EIP to the return address (this will load the EIP with the instruction located at the address at EBP + 4 in the memory). We can identify this asm block as part of epilog which means we are at the end of it's execution. So let's look a little bit before in the asm to verify what was being executed before the epilog block:
0:000> u eip -13 eip+1 MyApp!AboutDlgProc+0xa60: 00401a60 55 push ebp 00401a61 8bec mov ebp,esp 00401a63 51 push ecx 00401a64 e805000000 call MyApp!AboutDlgProc+0xa6e (00401a6e) 00401a69 8945fc mov dword ptr [ebp-4],eax 00401a6c ebf6 jmp MyApp!AboutDlgProc+0xa64 (00401a64) 00401a6e 55 push ebp 00401a6f 8bec mov ebp,esp 00401a71 33c0 xor eax,eax 00401a73 5d pop ebp |
Now we got the complete block since the execution of the prologue (the push EBP; mov EBP, ESP; push ECX instructions). By analyzing the block above we see that after the prolog execution, we make a function call to the instrution at 00401a6e (call MyApp!AboutDlgProc+0xa6e), so let's verify what should be executed when doing that:
0:000> u 00401a6e MyApp!AboutDlgProc+0xa6e: 00401a6e 55 push ebp 00401a6f 8bec mov ebp,esp 00401a71 33c0 xor eax,eax 00401a73 5d pop ebp 00401a74 c3 ret |
All right, so what we see here, after the prolog block, is an exclusive OR (XOR) operation being executed using EAX as both operands, and then the epilog block is executed - This epilog block is the same one (the instruction at the address 00401a73) our EIP is currently set to (notice the first asm block at the beginning of this analysis).
The XOR operation is just being used to set the EAX register to 0 (the result of the operation is loaded to the destination operand, in this case the EAX itself). If we have a chance to look at the application source code, this is likely something like "MyLocalVar = 0;". So it comes the question: Why not make a "mov EAX,0" instead of a "xor EAX,EAX"? The answer is simple: it's a matter of cost. The instruction XOR costs less than the instruction MOV to the CPU, so it's faster to execute a XOR since in this case it will produce the same desired result.
So what we know so far is that we have a function being called and the only thing this function does is set a variable to 0 and returns. So let's keep going from the return of this function in the following block:
MyApp!AboutDlgProc+0xa60: 00401a60 55 push ebp 00401a61 8bec mov ebp,esp 00401a63 51 push ecx 00401a64 e805000000 call MyApp!AboutDlgProc+0xa6e (00401a6e) 00401a69 8945fc mov dword ptr [ebp-4],eax 00401a6c ebf6 jmp MyApp!AboutDlgProc+0xa64 (00401a64) 00401a6e 55 push ebp 00401a6f 8bec mov ebp,esp 00401a71 33c0 xor eax,eax 00401a73 5d pop ebp |
We've just returned from the instruction "call MyApp!AboutDlgProc+0xa6e". So the next instruction to be executed will the one at the address 00401a69 which is just moving the EAX value (which we already know at this point should be 0) to the location of EBP - 4. EBP - 4 is known to be the location of the first local variable of the current function at the call stack, so we're basically copying the EAX value to one or our local variables.
The next instrutcion (jmp MyApp!AboutDlgProc+0xa64) will cause the execution to "jump" to the address 00401a64 which, at this point, is already familiar to us. We know that the instruction at 00401a64 will make a function call to the function MyApp!AboutDlgProc+0xa6e which we analyzed before and know it's just setting the EAX to 0 and returning.
The conclusion now is obvious. We have an infinite loop being executed and this is the reason we have a 100% CPU condition happening. This has been proven to be a bug within the MyApp.exe application. The next step is to contact the MyApp.exe vendor and let them know about the bad news J
Comments
- Anonymous
January 01, 2003
This is another 100% CPU issue. This time I’m not working on any customer case as the issue is happening