Windows Azure Worker Role Showing High CPU
I was recently troubleshooting an issue where a Worker Role was deployed on Windows Azure.The Role had 3 instances and during most of the time all the instances used to run at 99% + CPU. The instances query a database, gets social media posts and send emails. Most of the time all the instances would peg at 99% CPU and won't come down, this was observed from the Windows Azure Management Portal. So I did an RDP to the instance to look at the process which was consuming more CPU. We found that WaWorkerHost process which was consuming 90% to 96% of the CPU. Since there were no errors or exceptions logged in I had no choice but to take a memory dump of the running process for analysis.
With high CPU we generally look for tight loops like a trivial do while or for/foreach.
From the dump files I found the following .
0:015> !threadpool
CPU utilization: 100%
Worker Thread: Total: 2 Running: 0 Idle: 2 MaxLimit: 32767 MinLimit: 1
Work Request in Queue: 0
--------------------------------------
Number of Timers: 6
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 2 CurrentLimit: 1 MaxLimit: 1000 MinLimit: 1
0:015> !clrstack
OS Thread Id: 0x1e0 (15)
Child SP IP Call Site
00000000240ce2a0 000007ff00746a4c System.DateTime.GetDatePart(Int32) [f:\dd\ndp\clr\src\BCL\System\DateTime.cs @ 752]
00000000240ce2e0 000007ff0074694e System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, BooleanByRef) [f:\dd\ndp\clr\src\BCL\System\TimeZoneInfo.cs @ 2129]
00000000240ce340 000007ff0074685a System.DateTime.get_Now() [f:\dd\ndp\clr\src\BCL\System\DateTime.cs @ 927]
00000000240ce3a0 000007ff00d8d568 WorkerRole.WorkerRole.Run()
00000000240ce420 000007ff00d8d3c5 Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.StartRoleInternal()
00000000240ce470 000007ff00d8d1c3 Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<StartRole>b__1()
00000000240ce4b0 000007ff0073f12c System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 445]
00000000240ce510 000007ff0073f02b System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object) [f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 411]
00000000240ce560 000007ff0073ef2d System.Threading.ThreadHelper.ThreadStart() [f:\dd\ndp\clr\src\BCL\System\Threading\Thread.cs @ 94]
00000000240ce9b8 000007fef7cb44c4 [GCFrame: 00000000240ce9b8]
00000000240ceda0 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240ceda0]
00000000240cef38 000007fef7cb44c4 [ContextTransitionFrame:00000000240cef38]
00000000240cf160 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240cf160]
0:014> !threadpool
CPU utilization: 100%
Worker Thread: Total: 3 Running: 0 Idle: 3 MaxLimit: 32767 MinLimit: 1
Work Request in Queue: 0
--------------------------------------
Number of Timers: 6
--------------------------------------
Completion Port Thread:Total: 1 Free: 1 MaxFree: 2 CurrentLimit:1 MaxLimit: 1000 MinLimit: 1
0:014> !dso
OS Thread Id: 0x1e0 (14)
RSP/REG Object Name
rsi 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
rdi 0000000002e61038 System.Threading.ContextCallback
00000000240CE420 0000000002e61cb0 WorkerRole.WorkerRole
00000000240CE470 0000000002e5a3b8 System.Collections.Generic.Dictionary`2[[System.String,mscorlib],[Microsoft.WindowsAzure.ServiceRuntime.RoleInstanceEndpoint,Microsoft.WindowsAzure.ServiceRuntime]]
00000000240CE4B0 0000000003097f50 System.IEquatable`1[[Microsoft.CSharp.RuntimeBinder.Semantics.AggregateType,Microsoft.CSharp]]
00000000240CE4B8 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE4C0 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE500 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE510 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE550 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE560 0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst
00000000240CE950 0000000002f43678 System.Threading.ThreadStart
0:014> !clrstack
OS Thread Id: 0x1e0 (14)
Child SP IP Call Site
00000000240ce3a0 000007ff00d8d57c WorkerRole.WorkerRole.Run()
00000000240ce420 000007ff00d8d3c5 Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.StartRoleInternal()
00000000240ce470 000007ff00d8d1c3 Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<StartRole>b__1()
00000000240ce4b0 000007ff0073f12c System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback,System.Object, Boolean)[f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 445]
00000000240ce510 000007ff0073f02b System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object)[f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 411]
00000000240ce560 000007ff0073ef2d System.Threading.ThreadHelper.ThreadStart()[f:\dd\ndp\clr\src\BCL\System\Threading\Thread.cs @ 94]
00000000240ce9b8 000007fef7cb44c4 [GCFrame: 00000000240ce9b8]
00000000240ceda0 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:
00000000240ceda0] 00000000240cef38 000007fef7cb44c4 [ContextTransitionFrame:00000000240cef38]
00000000240cf160 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240cf160]
As it can be seen at the very top of the stack we can see that WorkerRole.Run() method was called. So I dumped out the WorkerRole.dll which is the binary for the custom code. Using any of the widely available reflector tools we can find the code in any .net dll. So I loaded the dll and looked at the code in the Run() method.
The code on the Run() method was as follows
public override void Run()
{
Utilities.CreateLogEntry("WorkerRole is running", null, LogCategories.Information);
this.StartProcessors();
DateTime time = Convert.ToDateTime("3:00:00 AM");
while (true)
{
DateTime now = DateTime.Now;
if ((DateTime.Compare(time,now) == 0) || (DateTime.Compare(time, now) < 0)) This will evaluate to true only between UTC 0:00:01AM till UTC 2:59:59 AM UTC. Rest of the time it will evaluate to false leading to trivial execution and hence a tight loop.
{
this.UnLoggedUserProcess();
time = time.AddDays(1.0);
}
}
}
I suggested to modify the code to something as below.
public override void Run()
{
Utilities.CreateLogEntry("WorkerRole is running", null, LogCategories.Information);
this.StartProcessors();
DateTime time = Convert.ToDateTime("3:00:00 AM");
while (true)
{
DateTime now = DateTime.Now;
if ((DateTime.Compare(time,now) == 0) || (DateTime.Compare(time, now) < 0))
{
this.UnLoggedUserProcess();
time = time.AddDays(1.0);
}
Thread.Sleep(10000); The thread will wait for 10 seconds before another evaluation of the if statement
Utilities.CreateLogEntry("Working: Waiting for Event", null, LogCategories.Information);
}
}
So the important part is to identify if there are any tight loops due to non trivial work and make sure that we add some delay in the function. With Worker Role Run method its very crucial since it has a While(true), in fact we add this Thread.Sleep in the template of the Worker Role and it's not advisable to remove it. Hope this will help you in identifying or narrowing down the cause of a high CPU issue with your Worker Roles.
Angshuman Nayak , Cloud Integration Engineer
Comments
- Anonymous
March 09, 2015
How did you download the dump file from the remote desktop?