Jaa


Windows Azure Worker Role Showing High CPU

I was recently troubleshooting an issue where a Worker Role was deployed on Windows Azure.The Role had 3 instances and during most of the time all the instances used to run at 99% + CPU. The instances query a database, gets social media posts and send emails. Most of the time all the instances would peg at 99% CPU and won't come down, this was observed from the Windows Azure Management Portal. So I did an RDP to the instance to look at the process which was consuming more CPU. We found that WaWorkerHost process which was consuming 90% to 96% of the CPU. Since there were no errors or exceptions logged in I had no choice but to take a memory dump of the running process for analysis.  

With high CPU we generally look for tight loops like a trivial do while or for/foreach. 

From the dump files I found the following .

 0:015> !threadpool

CPU utilization: 100%

Worker Thread: Total: 2 Running: 0 Idle: 2 MaxLimit: 32767 MinLimit: 1

Work Request in Queue: 0

--------------------------------------

Number of Timers: 6

--------------------------------------

Completion Port Thread:Total: 1 Free: 1 MaxFree: 2 CurrentLimit: 1 MaxLimit: 1000 MinLimit: 1 

0:015> !clrstack

OS Thread Id: 0x1e0 (15)

Child SP IP                                                                                                                                        Call Site

00000000240ce2a0             000007ff00746a4c System.DateTime.GetDatePart(Int32)                                                                                    [f:\dd\ndp\clr\src\BCL\System\DateTime.cs @ 752]

00000000240ce2e0             000007ff0074694e    System.TimeZoneInfo.GetDateTimeNowUtcOffsetFromUtc(System.DateTime, BooleanByRef) [f:\dd\ndp\clr\src\BCL\System\TimeZoneInfo.cs @ 2129]

00000000240ce340             000007ff0074685a System.DateTime.get_Now()                                                                                                [f:\dd\ndp\clr\src\BCL\System\DateTime.cs @ 927]

00000000240ce3a0             000007ff00d8d568 WorkerRole.WorkerRole.Run()

00000000240ce420 000007ff00d8d3c5 Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.StartRoleInternal()

00000000240ce470 000007ff00d8d1c3     Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<StartRole>b__1()

00000000240ce4b0 000007ff0073f12c     System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 445]

00000000240ce510 000007ff0073f02b    System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object) [f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 411]

00000000240ce560 000007ff0073ef2d    System.Threading.ThreadHelper.ThreadStart() [f:\dd\ndp\clr\src\BCL\System\Threading\Thread.cs @ 94]

00000000240ce9b8 000007fef7cb44c4 [GCFrame: 00000000240ce9b8]

00000000240ceda0 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240ceda0]

00000000240cef38 000007fef7cb44c4 [ContextTransitionFrame:00000000240cef38]

00000000240cf160 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240cf160] 

0:014> !threadpool

CPU utilization: 100%

Worker Thread: Total: 3 Running: 0 Idle: 3 MaxLimit: 32767 MinLimit: 1

Work Request in Queue: 0

--------------------------------------

Number of Timers: 6

--------------------------------------

Completion Port Thread:Total: 1 Free: 1 MaxFree: 2 CurrentLimit:1 MaxLimit: 1000 MinLimit: 1

0:014> !dso

OS Thread Id: 0x1e0 (14)

RSP/REG                  Object Name

rsi                             0000000003097fe8 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

rdi                             0000000002e61038 System.Threading.ContextCallback

00000000240CE420 0000000002e61cb0 WorkerRole.WorkerRole

00000000240CE470 0000000002e5a3b8                 System.Collections.Generic.Dictionary`2[[System.String,mscorlib],[Microsoft.WindowsAzure.ServiceRuntime.RoleInstanceEndpoint,Microsoft.WindowsAzure.ServiceRuntime]]

00000000240CE4B0 0000000003097f50                  System.IEquatable`1[[Microsoft.CSharp.RuntimeBinder.Semantics.AggregateType,Microsoft.CSharp]]

00000000240CE4B8 0000000003097fe8                  Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE4C0 0000000003097fe8                  Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE500 0000000003097fe8                  Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE510 0000000003097fe8                 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE550 0000000003097fe8                 Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE560 0000000003097fe8                  Microsoft.CSharp.RuntimeBinder.Semantics.MethPropWithInst

00000000240CE950 0000000002f43678 System.Threading.ThreadStart

0:014> !clrstack

OS Thread Id: 0x1e0 (14)

Child SP                          IP                            Call Site

00000000240ce3a0       000007ff00d8d57c WorkerRole.WorkerRole.Run()

00000000240ce420       000007ff00d8d3c5         Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.StartRoleInternal()

00000000240ce470       000007ff00d8d1c3 Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<StartRole>b__1()

00000000240ce4b0       000007ff0073f12c          System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback,System.Object, Boolean)[f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 445]

00000000240ce510 000007ff0073f02b          System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext,System.Threading.ContextCallback, System.Object)[f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 411]

00000000240ce560 000007ff0073ef2d          System.Threading.ThreadHelper.ThreadStart()[f:\dd\ndp\clr\src\BCL\System\Threading\Thread.cs @ 94]

00000000240ce9b8 000007fef7cb44c4 [GCFrame: 00000000240ce9b8]

00000000240ceda0 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:
00000000240ceda0]     00000000240cef38 000007fef7cb44c4 [ContextTransitionFrame:00000000240cef38]

00000000240cf160 000007fef7cb44c4 [DebuggerU2MCatchHandlerFrame:00000000240cf160]

 

As it can be seen at the very top of the stack we can see that WorkerRole.Run() method was called. So I dumped out the WorkerRole.dll which is the binary for the custom code. Using any of the widely available reflector tools we can find the code in any .net dll. So I loaded the dll and looked at the code in the Run() method. 

The code on the Run() method was as follows

 public override void Run()

 {         

    Utilities.CreateLogEntry("WorkerRole is running", null, LogCategories.Information);

    this.StartProcessors();

    DateTime time = Convert.ToDateTime("3:00:00 AM");

      while (true)

           {

               DateTime now = DateTime.Now;

               if ((DateTime.Compare(time,now) == 0) || (DateTime.Compare(time, now) < 0))  This will evaluate to true only between UTC 0:00:01AM till UTC 2:59:59 AM UTC. Rest of the time it will evaluate to false leading to trivial execution and hence a tight loop.

                {

                   this.UnLoggedUserProcess();

                   time = time.AddDays(1.0);

               }

            }       

}

 

I suggested to modify the code to something as below. 

public override void Run()

        {

           Utilities.CreateLogEntry("WorkerRole is running", null, LogCategories.Information);

            this.StartProcessors();

            DateTime time = Convert.ToDateTime("3:00:00 AM");

            while (true)

            {

               DateTime now = DateTime.Now;               

              if ((DateTime.Compare(time,now) == 0) || (DateTime.Compare(time, now) < 0))

               
              {

                   this.UnLoggedUserProcess();                   

                   time = time.AddDays(1.0);              

              }

              Thread.Sleep(10000); The thread will wait for 10 seconds before another evaluation of the if statement

               Utilities.CreateLogEntry("Working: Waiting for Event", null, LogCategories.Information);

             }

         }

So the important part is to identify if there are any tight loops due to non trivial work and make sure that we add some delay in the function. With Worker Role Run method its very crucial since it has a While(true), in fact we add this Thread.Sleep in the template of the Worker Role and it's not advisable to remove it. Hope this will help you in identifying or narrowing down the cause of a high CPU issue with your Worker Roles. 

 

Angshuman Nayak , Cloud Integration Engineer

Comments

  • Anonymous
    March 09, 2015
    How did you download the dump file from the remote desktop?