Freigeben über


Why did my Azure VM restart?

 

An unexpected restart of an Azure VM is an issue that commonly results in a customer opening a support incident to determine the cause of the restart. Hopefully the explanation below provides details to help understand why an Azure VM could have been restarted.

 

Windows Azure updates the host environment approximately once every 2-3 months to keep the environment secure for all applications and virtual machines running on the platform. This update process may result in your VM restarting, causing downtime to your applications/services hosted by the Virtual Machines feature. There is no option or configuration to avoid these host updates. In addition to platform updates, Windows Azure service healing occurs automatically when a problem with a host server is detected and the VMs running on that server are moved to a different host. When this occurs, you loose connectivity to VM during the service healing process. After the service healing process is completed, when you connect to VM, you will likely to find a event log entry indicating VM restart (either gracefully or unexpected). Because of this, it is important to configure your VMs to handle these situations in order to avoid downtime for your applications/services. 

 

To ensure high availability of your applications/services hosted in Windows Azure Virtual Machines, we recommend using multiple VMs with availability sets. VMs in the same availability set are placed in different fault domains and update domains so that planned updates, or unexpected failures, will not impact all the VMs in that availability set. For example, if you have two VMs and configure them to be part of an availability set, when a host is being updated, only one VM is brought down at a time. This will provide high availability since you have one VM available to serve the user requests during the host update process. Mark Russinovich has posted a great blog post which explains Windows Azure Host updates in detail. Managing the high availability is detailed here.

 

While availability sets help provide high availability for your VMs, we recognize that proactive notification of planned maintenance is a much-requested feature, particularly to help prepare in a situation where you have a workload that is running on a single VM and is not configured for high availability. While this type of proactive notification of planned maintenance is not currently provided, we encourage you to provide comments on this topic so we can take the feedback to the product teams.

 [Update] Planned notification is being sent for single instance VMs. However it may only be reaching Account administrators.   

 

Key words : VM , Restart, Shutdown, Unexpected reboot, Windows Azure

Comments

  • Anonymous
    September 23, 2013
    I would like to have a email notification 48 hours before my vm restarting. Thanks.

  • Anonymous
    September 24, 2013
    Also, by default, VMs are set to download and install Windows Update patches automatically. Post-update reboots can also cause downtime. My suggestion is to adjust Windows Update settings to download but not install patches; then when you have a maintenance window, manually install the patches and allow the VM to reboot.

  • Anonymous
    September 24, 2013
    For Linux, will Role.StateConsumer in waagent.conf be triggered when such a reboot will be performed? How long will the fabric wait for the script to return?

  • Anonymous
    September 29, 2013
    It would be great if the updates are scheduled and that advanced notice is given. Having hosts that are already upgraded ahead of time would be great. The customer could then choose to move to the upgrade themselves at a time suited to their scheduled ahead of time and thereby avoid the scheduled maintenance window which may not suit them, or wait and be upgraded automatically during the scheduled upgrade window.

  • Anonymous
    September 29, 2013
    So let met get this straight, it's standard practice to randomly reboot production servers without any notice?

  • Anonymous
    September 29, 2013
    This is the worst business practice I've ever seen.  You will NOT reboot or shutdown any of our servers. If this is an issue, I expect someone to contact me about this immediately.

  • Anonymous
    September 30, 2013
    Is there any equivalent of fault domains / availability sets for Azure Web Sites running in standard mode? I see the recommendation to run multiple VMs for AWS to increase availability, but will these automatically get assigned to different fault domains when they're Web Sites VMs? If not, is there a way of exposing the availability set control available to Virtual Machines to Web Sites as well?

  • Anonymous
    October 27, 2013
    The comment has been removed

  • Anonymous
    November 01, 2013
    The comment has been removed

  • Anonymous
    November 05, 2013
    Hi Guy's, Why can't you live migrate the VM's off the host and patch your hosts. This function is there so please use it otherwise it makes life difficult for people using Azure when VM's just shutdown and restart.

  • Anonymous
    December 06, 2013
    Has anyone at MS seen these comments? Is anything being done? This lack of control is really not acceptable. We just had 2 servers restart back-to-back for about 30 minutes each in the middle of the afternoon. No notice and a generic shutdown message in the event log. Effectively your recommendation is to double the infrastructure cost across the board to get around an update policy you have forced driving random server updates. What a joke.

  • Anonymous
    December 06, 2013
    The comment has been removed

  • Anonymous
    January 13, 2014
    This rebooting thing is our main reason for not choosing Azure. Actually, it is the ONLY reason. We are a service provider and host several virtual servers for our customers in a third party datacenter. None of our customers would accept a planned reboot during work hours, not even when they are notified in advance.

  • Anonymous
    February 03, 2014
    SQL Azure doesn't meet all the needs so if someone has to deploy their SQL Server in Azure VM and it randomly reboots, who in their right mind would think of using Windows Azure VM's?

  • Anonymous
    February 05, 2014
    This is ridiculous! I can't believe that a service provider would expect their customers to be ok with this process. Rebooting all servers with zero notification and in the middle of the day is unacceptable. Looks like we will have to find a new provider. The escalation desk told me they were unaware of any scheduled maintenance, so I guess they don't even provide internal notification.

  • Anonymous
    February 05, 2014
    This is the first page that should come up when any potential customer searches for Azure hosting or Azure vs Amazon Web Services

  • Anonymous
    February 09, 2014
    Another caveat is that the load balancer is not told that the reboot is about to occur.   Thus, incoming HTTP requests will be routed to the stopped VM for some time.   Even the HA solution suffers from this problem.

  • Anonymous
    February 20, 2014
    Hi Guy's, why is Azure not using Live Migration to move customer Vm's to other nodes when they have to do updates? also when a VM is restarted it goes down for 20-30 mins not just 2mins which makes it impossible to explain to the customer, especially when your application relies on multiple VM's which can then cause an outage of a few hours while Microsoft reboots them, also the lack of any notifications of upgrades taking place is a huge issue. If we had a notification at least we can warn our customers. Azure Vm's are by no means ready for production use, be warned everyone!

  • Anonymous
    February 27, 2014
    Engineer your product properly and this doesnt become a issue.

  • Anonymous
    March 21, 2014
    This just happened to me twice in one evening.  and not only did it take each server much longer than normal to boot, the essential services (iis on web services, sql on SQL services, etc ) did not restart with out manual intervention.  I have fail overs and HA, but it is of no use when i have to babysit all the servers so that they actually come back up before another server restarts.  This does not happen when i restart the servers myself so why when MS does it?   I have been pretty happy with Azure so far but this is a pretty big thorn in my side.

  • Anonymous
    March 22, 2014
    The comment has been removed

  • Anonymous
    March 22, 2014
    Nick, did MS support provide any reason why the services failed after the restart?  I have gotten no where on finding the answer to that question.  This really gets me thinking, if I had a couple hundred servers, and they begin restarting haphazardly, am I to wait for days on end for each server while it restarts? What is the solution? If I find out any information I will be sure to post it back here.

  • Anonymous
    March 22, 2014
    The comment has been removed

  • Anonymous
    March 23, 2014
    Josh, I have a pending case still. I will let you know when I get an answer. I am also calling our Microsoft TAM on Monday to escalate. Did this occur on your servers on 3/21? Just curious if it was on the same day. Nick

  • Anonymous
    March 24, 2014
    The comment has been removed

  • Anonymous
    March 25, 2014
    The comment has been removed

  • Anonymous
    March 25, 2014
    and now the web server... Time to move to AWS

  • Anonymous
    March 25, 2014
    Is it safe to conclude that with 6 months worth of comments, that no MS moderator is responding to customer needs discussed on this post?

  • Anonymous
    March 29, 2014
    I posted about my issue here: social.msdn.microsoft.com/.../vm-machine-automatically-rebooted

  • Anonymous
    April 02, 2014
    The comment has been removed

  • Anonymous
    April 09, 2014
    Riding a wave of post-BUILD enthusiasm, we were seriously looking at moving some.of our stuff into Azure.  Now I'm having second thoughts.  I can't understand why they let this sort of thread smoulder unattended.

  • Anonymous
    April 11, 2014
    Thank you very much for providing the feedback, sharing your experience. Feedback shared on this thread, shared via support channels have been communicated internally and it has been very valuable, well received and acknowledged. We recognize that customers do experience disruption, application outages during updates to VM infrastructure maintenance that cause VM restarts. Teams are rigorously making efforts to minimize the impact to customers during the Azure platform maintenance.

  • Anonymous
    April 12, 2014
    This is not acceptable as there are services which are running and to stop non gracefully means that there's a possibility of data corruption.

  • Anonymous
    April 15, 2014
    Wow, so this is why my VM was restarted at 6pm on Saturday... could you at least not wait until early morning?! My server is located in Europe so I think it's safe to assume the restarting at 3AM CET would be better than 6pm. That said I'd rather you didn't restart it at all. Will be moving to a different provider ASAP.

  • Anonymous
    July 30, 2014
    For the past three days, between 8:30AM and 9:30AM, the virtual network connectivity between VM's in our Azure cloud disappeared. Over the past year, I have documented cases of servers being randomly rebooted (healing MS calls it) in the middle of the day. I have documented errors of a SQL Server VM not being able to write to a SQL database file (I/O taking longer than 15s) and then the errors "vanish" after a day or two. I have documented support requests from Microsoft where they admit that you must rewrite applications to actually take advantage of "availability sets" - e.g. any architecture that uses a shared network drive. I have logged at least a dozen support requests, and Microsoft has never once gotten to any conclusion. Microsoft support has admitted (in writing) they have no access to the actual hardware or data center diagnostics. Now, they give me two 12 hour time windows on Friday and Saturday when they're going to reboot my servers again. They can't figure out within 24 hours what they're doing. We have created a new cloud with a new provider and are copying content and database files over to it now. We've had to send a message to our customers coming clean on how bad Azure is, and that we're moving as fast as we can to the new cloud, and that we made a mistake in putting a couple new customers in this environment. I have been writing Web software for well over a decade (and have been programming for 30 years), hosted with many providers, and I have never encountered anything as unstable as Azure - and I have it all documented so this isn't just a rant. It's truly awful. Everything above is true, if you still think Azure is something to consider, do yourself a favor and move on.

  • Anonymous
    July 30, 2014
    And one more important p.s., I have an e-mail exchange from a Microsoft Support person who admits that the health dashboard does not always reflect all outages and they "recognize" that it is at the discretion of Microsoft what they post in the health dashboard.  Pretty easy to tout records of stability when the records aren't correct huh?

  • Anonymous
    August 01, 2014
    A conclusion to the aforementioned case.  There was a VM that everyday would simply disappear from the network of VM's in my cloud at exactly 8 AM PST.  Microsoft confirmed there is a "problem with the host upon which the VM is running".  So, just move it Microsoft, right?  And by the way, how come it didn't "heal" itself already?  Nope, I have to completely deallocate the VM or resize it to force the VM to jump to a new host. So, no automatic "healing", and they cannot even move VM's off malfunctioning hosts themselves, you have to do it. An entire week of customer down time, code debugging, diagnostics, etc. all lost to the above.

  • Anonymous
    August 16, 2014
    We just had a restart of our linux VM's, without prior notice. We need a warning from Azure 48 hours before any server reboot. I can't find any information about the reboot on the dashboard, not even after the fact.

  • Anonymous
    August 22, 2014
    I am terrified reading this thread! I just a shutdown of our production SQL database, and had to Start it up manually from the portal! I open a support ticket, but from reading these posts I am not too optimistic about this! We just sign up for a prepaid 3 year  Enterprise Agreement - what a nightmare

  • Anonymous
    September 27, 2014
    We just finished moving a 80 VM Azure enterprise deployment to AWS for the reasons others have commented on above.  When we engaged MS Premium Support to help assist diagnosing why VMs we're restarting automatically, they offered to bring an Australian Azure Architect in to discuss the deployment architecture.  We took them up on this offer and surprise, surprise, we were told by the architect we needed an additional 30 VM DR production deployment to get around various problems Azure problems.  Needless to say the client wasn't impressed that the architect acknowledged problems and suggested they pay significant extra dollars to work around them!  Since moving to AWS we've been satisfied (it has it's issues too) but the level of maturity is much higher and the whole offering is a lot more robust.  Additionally their support is responsive so far and have been helpful without recommending we buy more VMs!

  • Anonymous
    November 03, 2014
    If we cannot do anything about these updates, why is this performed in the european data center during BUSINESS HOURS???? (last incident on a Monday at 3:46 PM) Why do you not shut down the machines cleanly? -> Got a Kernel-Power EventID 41... In our local Hyper-V environment i can move VMs Live (VSM) to another Hyper-V host - does Azure not support this functionality????

  • Anonymous
    January 12, 2015
    The comment has been removed

  • Anonymous
    January 23, 2015
    The comment has been removed

  • Anonymous
    January 30, 2015
    I've been doing a trial run of Azure and it's pretty stable for the most part, but I get random reboots just like everybody else here.  I'm running a single VM with a non-critical workload, so that's actually okay for me.  My problem is how this always seems to happen in the middle of the day local time.  I mean if I buy a VM hosted in USA East, I would expect that host maintenance would happen at 3AM, not 3PM. I initially suspected the host went down and it was a force failover to a new physical host, but the fact that it's happened about 10 times now, all in the middle of the day, I'm pretty sure I haven't had the bad luck of 10 physical hosts going down over four months. Some of the posters here have unrealistic expectations, but everybody is correct that the current way this is being done is well below par compared to other hosting providers. Personally, I'd be happy with just a few small improvements.

  1. Email notification immediately after a failed physical host and forced migration.
  2. 48 hours notice ahead of time for host updates that will cause a shutdown. As others above have mentioned... why the heck isn't live migration being used for scheduled host updates?  With Server 2012 R2, in my tiny little 12 physical server setup, I can do shared-nothing live migration between my Hyper-V hosts.  How on earth is this not available in Azure?
  • Anonymous
    February 22, 2015
    Alright MS not to that whiny kid but this is a real deal breaker.  Like many people point out, you do have live migration and yes it might be of some work to set it up at first but come on.. people dont want random server restarts... Honestly i was looking forward to working and understand Azure but this is a real problem..

  • Anonymous
    March 16, 2015
    We have had this in the Australian East data centre right in the middle of the Australian work day (11.30am). Completely unexpected / non graceful reboots. Out of the machines in question, one of them was a Domain Controller and suffered data corruption, obviously NOT ideal. We are highly concerned about putting any more clients in Azure until this is resolved. We just reverted back to putting a deal in our old cloud VM provider and we have another few more deals on the table that we'd love to put through Azure, however are too concerned about the business risk. Small businesses in Australia (or anywhere in the world) can't justify putting twice (if not more) the amount just to stick with Microsoft - they'll simply go to other providers until this is sorted. Honestly, this really does sound like a money grab (customers needs twice the number of services to run properly) than a technical issue as we all know Microsoft has the technology to ensure that this doesn't happen. Will be keeping a keen eye on this thread as we REALLY want to just sell/support a full Microsoft stack however Azure is the one thing stopping us from doing that at the moment (O365, Server, Win8, Office are all amazing and are core of everything else we do, we just can't rely on Azure just yet).

  • Anonymous
    March 19, 2015
    The comment has been removed

  • Anonymous
    March 21, 2015
    The comment has been removed

  • Anonymous
    April 22, 2015
    The comment has been removed

  • Anonymous
    April 30, 2015
    My DS Series VM running SQL Server Enterprise just restarted for the 7th time in the late 2 days. This is insane.

  • Anonymous
    June 02, 2015
    Please people if you are affected by this then up vote the suggestion here. Its mind boggling that this doesn't have more attention being such a critical flaw feedback.azure.com/.../7031369-host-reboots-without-vm-reboot

  • Anonymous
    July 31, 2015
    Microsoft finally recognizing the problem...I don't know current status.  up2v.nl/.../microsoft-azure-virtual-machines-in-future-will-remain-active-when-host-reboots-for-planned-maintenance

  • Anonymous
    November 14, 2015
    I would like to be able to stipulate a default time of day for restarts. 03:30 AM for example