WCF Worker Role Just Recycles on Windows Azure
I was recently working on a customer issue where the WCF Worker Role won’t come up. On the Windows Azure Management Portal all that can be seen is that
the role keeps on recycling.
The following message is perpetually displayed.
“Recycling (Recovering role…System is initializing.[2012-12-26T10:56:45Z]”
On the new HTML portal it will look something as below.
So let me explain it with a bit of architecture thrown in. Below is the diagram (shamelessly plagiarized from Kevin Willamson’s blog ) of the workflow of a service getting up and running. I would highly recommend it to be read and understood.
Below is the diagram of the workflow of a service getting up and running.
(Courtesy : Kevin Williamson Senior Escalation Engineer, Windows Azure )
The yellow part is more or less the Virtual Machine getting prepped and ready to host. That is successful since the VM itself is up and we can remote in to it. So let’s see which of the
downstream processes is causing the issue. I used Process Explorer to go through the same and please have the architecture diagram handy to understand the flow of events.
In Process Explorer I see WaHostBootStrapper Starting
Then Vanishing
So now I had to be more attentive to the chain of events. This time I fixated my eyes on the events showing up in Process Explorer.
AppAgent Starts HostBootStrapper which in turn starts the following
a) Diagnostic Agent
b) Remote Desktop Agent
c) WaWorkerHost
WaWorkerHost (in the image it is WaIISHost but think it of as WorkerHost) comes up for a fraction of a second and then vanishes
And then back to the original state
So it’s pretty evident that the WaWorkerHost crashes and the WaHostBootstrapper kicks it in again and this
chain of events keeps on repeating.
WaWorkerHost.exe is the host process for role entry point code for worker roles. This process will load the first DLL found which implements the
RoleEntryPoint class (this DLL is defined in E:\__entrypoint.txt) and execute the code from this class (OnStart, Run, OnStop). Any RoleEnvironment events
(ie. StatusCheck, Changed, etc) created in the RoleEntryPoint class will be raised in this process.
The next logical step was to look at Windows Event viewer since it was a process crash.
Sure enough EventViewer showed that WaHostBootstrapper.exe is crashing regularly. It had repeated entries for the following.
#. Exception thrown is
Application: WaWorkerHost.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.BadImageFormatException
Stack: at Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<InitializeRole>b__0() at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at
System.Threading.ThreadHelper.ThreadStart()
<Event xmlns='https://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='.NET Runtime'/><EventID Qualifiers='0'>1026</EventID><Level>2</Level><Task>0</Task><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2013-08-28T17:00:30.000000000Z'/><EventRecordID>2480</EventRecordID><Channel>Application</Channel><Computer>RD00155D49592A</Computer><Security/></System><EventData><Data>Application: WaWorkerHost.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.BadImageFormatException
Stack: at Microsoft.WindowsAzure.ServiceRuntime.Implementation.Loader.RoleRuntimeBridge.<InitializeRole>b__0() at
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at
System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) at
System.Threading.ThreadHelper.ThreadStart()
</Data></EventData></Event>
Faulting application name: WaWorkerHost.exe, version: 2.1.1196.512, time stamp: 0x51ccac26
Faulting module name: KERNELBASE.dll, version: 6.1.7601.17965, time stamp: 0x506dcae6
Exception code: 0xe0434352
Fault offset: 0x000000000000bccd
Faulting process id: 0xad0
Faulting application start time: 0x01cea4101899531f
Faulting application path: F:\base\x64\WaWorkerHost.exe
Faulting module path: D:\Windows\system32\KERNELBASE.dll
Report Id: 57b04337-1003-11e3-8f1f-00155d49592a
<Event xmlns='https://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Application Error'/><EventID Qualifiers='0'>1000</EventID><Level>2</Level><Task>100</Task><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2013-08-28T17:00:30.000000000Z'/><EventRecordID>2481</EventRecordID><Channel>Application</Channel><Computer>RD00155D49592A</Computer><Security/></System><EventData><Data>WaWorkerHost.exe</Data><Data>2.1.1196.512</Data><Data>51ccac26</Data><Data>KERNELBASE.dll</Data><Data>6.1.7601.17965</Data><Data>506dcae6</Data><Data>e0434352</Data><Data>000000000000bccd</Data><Data>ad0</Data><Data>01cea4101899531f</Data><Data>F:\base\x64\WaWorkerHost.exe</Data><Data>D:\Windows\system32\KERNELBASE.dll</Data><Data>57b04337-1003-11e3-8f1f-00155d49592a</Data></EventData></Event>
The most likely cause of this issue is a 32bit (x86) dll being used in the application. Since the WaWorkerHost.exe is a 64bit (x64) process it fails to load the binary into the memory and gives this exception.
We use WinDbg to load the process. Please use the diagnostic utility by the Windows Azure Developer Support Team https://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx . This leverages WinDbg internally to bind to the process and helsp in breaking in. The blog details how you can bind the debugger to the process again courtesy Kevin Williamson (Microsoft).
The crashing thread was thread 7
0:007> !pe
Exception object: 00000000019741b8
Exception type: System.BadImageFormatException
Message: Could not load file or assembly 'foo.BusninessLayer, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. An attempt was made to load a program with an incorrect format.
InnerException: <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 8007000b
So I looked into the dll "foo.BusninessLayer". This is built as a x86 version only.
If we load the dll in ILDASM at location C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools (Windows SDK needs to be installed to get ILDASM)
.module gforms.BLL.Facade.dll// MVID: {BE9AF235-FDAF-43C6-B1D7-AC740A4BB105}
.custom instance void [mscorlib]System.Security.UnverifiableCodeAttribute::.ctor() = ( 01 00 00 00 )
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003 // WINDOWS_CUI
.corflags 0x00000003 // ILONLY 32BITREQUIRED
// Image base: 0x00D70000
The 0x0000003 indicates that the DLL was build as X86 bit only. If it was built for AnyCPU the value would have been .corflags 0x00000001 // ILONLY
The host process WaWorkerHost is a 64bit as any other Windows Azure core processes and hance it fails to load this DLL which is built as 32bit (X86) only. So the customer got the binaries used in the project rebuilt with AnyCPU and then the *.cspkg was re-published and the WCF Worker Role came up fine this time.
So if you face similar issue about bad image the first thing to check is if any dll referred in the project is a 32bit(X86) only DLL. You can also use the corflags.exe from Visual Studio Command Prompt to find if an assembly is compiled as x86, x64 or AnyCPU.
I hope it helps resolving a Role Recycle issue with System.BadImageFormatException Exception.