Handling planned maintenance notifications

Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

Azure periodically performs updates to improve the reliability, performance, and security of the host infrastructure for virtual machines. Updates are changes like patching the hosting environment or upgrading and decommissioning hardware. A majority of these updates are completed without any impact to the hosted virtual machines. However, there are cases where updates do have an impact:

  • If the maintenance doesn't require a reboot, Azure pauses the VM for few seconds while the host is updated. These types of maintenance operations are applied fault domain by fault domain. Progress is stopped if any warning health signals are received.

  • If maintenance requires a reboot, you get a notice of when the maintenance is planned. You're given a time window of about 35 days where you can start the maintenance yourself, when it works for you.

Planned maintenance that requires a reboot is scheduled in waves. Each wave has different scope (regions).

  • A wave starts with a notification to customers. Virtual Machine related Maintenance notifications are available under Service Health in the Azure portal. For a small subset of Virtual Machine Planned Maintenance specific scenarios, Azure may also communicate the schedule by sending an additional email to the members of Subscription Owners group. Azure Service Health enables users to configure their own custom alerts for the Planned Maintenance category. With Azure Service Health alerts, you can add more recipients and messaging options like email, SMS, and webhooks using Activity Log Alerts.
  • Except a subset of event types, the Service Health Planned Maintenance Notification now also has an additional tab "Impacted Resources" which lists the VM names in scope of the event. Read more about Impacted Resources Preview Feature.
  • Once a notification goes out, a self-service window is made available. During this window, you can query which of your virtual machines are affected and start maintenance based on your own scheduling needs. The self-service window is typically about 35 days.
  • After the self-service window, a scheduled maintenance window begins. At some point during this window, Azure schedules and applies the required maintenance to your virtual machine.

The goal in having two windows is to give you enough time to start maintenance and reboot your virtual machine while knowing when Azure will automatically start maintenance.

You can use the Azure portal, PowerShell, REST API, and CLI to query for the maintenance windows for your VMs and start self-service maintenance.

Should you start maintenance using during the self-service window?

The following guidelines should help you decide whether to use this capability and start maintenance at your own time.

Note

Self-service maintenance might not be available for all of your VMs. To determine if proactive redeploy is available for your VM, look for the Start now in the maintenance status. Self-service maintenance is currently not available for Cloud Services (Web/Worker Role) and Service Fabric.

Self-service maintenance isn't recommended for deployments using availability sets. Availability sets are already only updated one update domain at a time.

  • Let Azure trigger the maintenance. For maintenance that requires reboot, maintenance will be done update domain by update domain. The update domains don't necessarily receive the maintenance sequentially, and that there's a 30-minute pause between update domains.
  • If a temporary loss of some capacity (1 update domain) is a concern, you can add instances during the maintenance period.
  • For maintenance that doesn't require reboot, updates are applied at the fault domain level.

Don't use self-service maintenance in the following scenarios:

  • If you shut down your VMs frequently, either manually, using DevTest Labs, using auto-shutdown, or following a schedule, it could revert the maintenance status and therefore cause additional downtime.
  • On short-lived VMs that you know will be deleted before the end of the maintenance wave.
  • For workloads with a large state stored in the local (ephemeral) disk that is desired to be maintained upon update.
  • For cases where you resize your VM often, as it could revert the maintenance status.
  • If you have adopted scheduled events that enable proactive failover or graceful shutdown of your workload, 15 minutes before start of maintenance shutdown

Use self-service maintenance, if you're planning to run your VM uninterrupted during the scheduled maintenance phase and none of the counter-indications mentioned above are applicable.

It's best to use self-service maintenance in the following cases:

  • You need to communicate an exact maintenance window to your management or end-customer.
  • You need to complete the maintenance by a given date.
  • You need to control the sequence of maintenance, for example, multi-tier application to guarantee safe recovery.
  • More than 30 minutes of VM recovery time is needed between two update domains (UDs). To control the time between update domains, you must trigger maintenance on your VMs one update domain (UD) at a time.

FAQ

Q: Why do you need to reboot my virtual machines now?

A: While the majority of updates and upgrades to the Azure platform don't impact virtual machine's availability, there are cases where we can't avoid rebooting virtual machines hosted in Azure. We have accumulated several changes that require us to restart our servers that will result in virtual machines reboot.

Q: If I follow your recommendations for High Availability by using an Availability Set, am I safe?

A: Virtual machines deployed in an availability set or virtual machine scale sets have the notion of Update Domains (UD). When performing maintenance, Azure honors the UD constraint and won't reboot virtual machines from different UD (within the same availability set). Azure also waits for at least 30 minutes before moving to the next group of virtual machines.

For more information about high availability, see Availability for virtual machines in Azure.

Q: How do I get notified about planned maintenance?

A: A planned maintenance wave starts by setting a schedule to one or more Azure regions. Virtual Machine related Maintenance notifications are available under Service Health in the Azure portal. For a small subset of Virtual Machine Planned Maintenance specific scenarios, Azure may also communicate the schedule by sending an additional email to the members of Subscription Owners group.

Except a subset of event types, the Service Health Planned Maintenance Notification now also has an additional tab "Impacted Resources" which lists the VM names in scope of the event. Read more about Impacted Resources Preview Feature.

Azure Service Health enables users to configure their own custom alerts for the Planned Maintenance category. With Azure Service Health alerts you can add more recipients and messaging options like email, SMS, and webhooks using Activity Log Alerts.

In case you deploy a virtual machine to a region where planned maintenance is already scheduled, you won't receive the notification but rather need to check the maintenance state of the VM.

Q: I don't see any indication of planned maintenance in the portal, PowerShell, or CLI. What is wrong?

A: Information related to planned maintenance is available during a planned maintenance wave only for the VMs that are going to be impacted by it. In other words, if you see not data, it could be that the maintenance wave has already completed (or not started) or that your virtual machine is already hosted in an updated server.

Q: Is there a way to know exactly when my virtual machine will be impacted?

A: When setting the schedule, we define a time window of several days. However, the exact sequencing of servers (and VMs) within this window is unknown. Customers who would like to know the exact time for their VMs can use scheduled events and query from within the virtual machine and receive a 15-minute notification before a VM reboot.

Q: How long will it take you to reboot my virtual machine?

A: Depending on the size of your VM, reboot may take up to several minutes during the self-service maintenance window. During the Azure initiated reboots in the scheduled maintenance window, the reboot will typically take about 25 minutes. Note that in case you use Cloud Services (Web/Worker Role), Virtual Machine Scale Sets, or availability sets, you'll be given 30 minutes between each group of VMs (UD) during the scheduled maintenance window.

Q: What is the experience in the case of Virtual Machine Scale Sets?

A: Planned maintenance is now available for Virtual Machine Scale Sets. For instructions on how to initiate self-service maintenance refer planned maintenance for virtual machine scale sets document.

Q: What is the experience in the case of Cloud Services (Web/Worker Role) and Service Fabric?

A: While these platforms are impacted by planned maintenance, customers using these platforms are considered safe given that only VMs in a single Upgrade Domain (UD) will be impacted at any given time. Self-service maintenance is currently not available for Cloud Services (Web/Worker Role) and Service Fabric.

Q: I don’t see any maintenance information on my VMs. What went wrong?

A: There are several reasons why you’re not seeing any maintenance information on your VMs:

  1. You're using a subscription marked as Microsoft internal.
  2. Your VMs aren't scheduled for maintenance. It could be that the maintenance wave has ended, canceled, or modified so that your VMs are no longer impacted by it.
  3. You have deallocated VM and then started it. This can cause VM to move to a location which doesn't have planned maintenance wave scheduled. So the VM won't show maintenance information any more.
  4. You don’t have the Maintenance column added to your VM list view. While we have added this column to the default view, customers who configured to see non-default columns must manually add the Maintenance column to their VM list view.

Q: My VM is scheduled for maintenance for the second time. Why?

A: There are several use cases where you'll see your VM scheduled for maintenance after you have already completed your maintenance-redeploy:

  1. We have canceled the maintenance wave and restarted it with a different payload. It could be that we've detected faulted payload and we simply need to deploy an additional payload.
  2. Your VM was service healed to another node due to a hardware fault.
  3. You have selected to stop (deallocate) and restart the VM.
  4. You have auto shutdown turned on for the VM.

Next steps

You can handle planned maintenance using the Azure CLI, Azure PowerShell or portal.