Troubleshooting details: Windows Azure Web Role was stuck due to an expception in IISConfigurator.exe process
Recently I was working with a partner problem in which the Azure web role was stuck and showing “preparing node” status in Windows Azure Management portal. Luckily RDP access to Azure VM was working so investigation to the problem was easier.
After logging to Azure VM over RDP the IISConfigurator.log showed the following error:
IISConfigurator Information: 0 : [11/22/11 10:05:42.49] Started iisconfigurator with args
IISConfigurator Information: 0 : [11/22/11 10:05:42.71] Started iisconfigurator with args /start
IISConfigurator Information: 0 : [11/22/11 10:05:42.72] StartForeground selected. Check if an instance is already running
IISConfigurator Information: 0 : [11/22/11 10:05:42.80] Starting service WAS
IISConfigurator Information: 0 : [11/22/11 10:05:43.28] Starting service w3svc
IISConfigurator Information: 0 : [11/22/11 10:05:43.53] Starting service apphostsvc
IISConfigurator Information: 0 : [11/22/11 10:05:43.96] Attempting to add rewrite module section declarations
IISConfigurator Information: 0 : [11/22/11 10:05:44.00] Section rules already exists
IISConfigurator Information: 0 : [11/22/11 10:05:44.00] Section globalRules already exists
IISConfigurator Information: 0 : [11/22/11 10:05:44.00] Section rewriteMaps already exists
IISConfigurator Information: 0 : [11/22/11 10:05:44.00] Adding rewrite module global module
IISConfigurator Information: 0 : [11/22/11 10:05:44.03] Already exists
IISConfigurator Information: 0 : [11/22/11 10:05:44.05] Enabling rewrite module global module
IISConfigurator Information: 0 : [11/22/11 10:05:44.07] Already exists
IISConfigurator Information: 0 : [11/22/11 10:05:44.07] Skipping Cloud Drive setup.
IISConfigurator Information: 0 : [11/22/11 10:05:44.07] Cleaning All Sites
IISConfigurator Information: 0 : [11/22/11 10:05:44.07] Deleting sites with prefix:
IISConfigurator Information: 0 : [11/22/11 10:05:44.08] Found site:Ayuda.WebDeployHost.Web_IN_0_Web
IISConfigurator Information: 0 : [11/22/11 10:05:44.11] Excecuting process 'D:\Windows\system32\inetsrv\appcmd.exe' with args 'delete site "<sitename>.WebDeployHost.Web_IN_0_Web"'
IISConfigurator Information: 0 : [11/22/11 10:05:44.22] Process exited with code 0
IISConfigurator Information: 0 : [11/22/11 10:05:44.22] Deleting AppPool: <AppPool_GUID>
IISConfigurator Information: 0 : [11/22/11 10:05:44.28] Found site:11600
IISConfigurator Information: 0 : [11/22/11 10:05:44.30] Excecuting process 'D:\Windows\system32\inetsrv\appcmd.exe' with args 'delete site "11600"'
IISConfigurator Information: 0 : [11/22/11 10:05:44.49] Process exited with code 0
IISConfigurator Information: 0 : [11/22/11 10:05:44.49] Deleting AppPool: 11600
IISConfigurator Information: 0 : [11/22/11 10:05:44.52] Deleting AppPool: 11600
IISConfigurator Information: 0 : [11/22/11 10:05:44.55] Unhandled exception: IsTerminating 'True', Message 'System.ArgumentNullException: Value cannot be null.
Parameter name: element
at Microsoft.Web.Administration.ConfigurationElementCollectionBase`1.Remove(T element)
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.WasManager.RemoveAppPool(ServerManager serverManager, String appPoolName)
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.WasManager.TryRemoveSiteAndAppPools(String siteName)
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.WasManager.CleanServer(String prefix)
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.WCFServiceHost.Open()
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.Program.StartForgroundProcess()
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.Program.DoActions(String[] args)
at Microsoft.WindowsAzure.ServiceRuntime.IISConfigurator.Program.Main(String[] args)'
If you study the above IISConfigurator.exe process exception call stack in the log, you will see, prior to the crash the code was trying to delete the AppPool 11600 and the deletion was completed and then there was an exception. After that the role could not start correctly.
During the investigation I found the following details which I decided to share with all of you:
The Windows Azure application has a Web role which was creating using a modified version of the “Windows Azure Accelerator for Web Roles” . This Application was customized in a way that when the role starts, in the role startup code it does the following:
- Creates a few sites in the IIS
- Create appropriate bindings for new sites created in step #1.
So when Windows Azure VM was restarted/rebooted due to any reason (OS–update, manual reboot, etc) the role gets stuck due to IISConfigurator exception.
This is because the IIS was not clean, when machine started so IISConfigurator was removing all previous sites to prepare IIS for Web Role. IISConfigurator process has only 1 minute to perform all the tasks and if IISConfigurator could not finish all task within 1 minute the exception will occur due to 1 minute timeout.
Solution:
Because web role was creating all these sites during role startup to ideal solution was to clean the IIS when the machine shuts down rather then putting burden on IISConfigurator to cleanup IIS during startup.
So the perfect solution, was to cleanup IIS during the RoleEnvironment.Stopping event. So to solve this problem we just added code launching “appcmd” to clean all the sites which are created earlier in RoleEnvironment.Stopping event and made sure IIS is clean.
You can read my blog blow to understand what is Stopping event and how to handle it properly in your code:
Ultimately, the problem was related with IIS having residual setting during machine startup and IISConfigurator could not clean IIS before timeout kicks in, which ultimately cause role startup problems. After adding necessary code in Stopping event to clean IIS, the web role started without any issues on reboot. You can also add a startup task to make sure the IIS is clean prior to role start as a failsafe.