Configure rolling upgrades on Virtual Machine Scale Sets

Using a rolling upgrade policy is the safest way to apply updates to instances in a Virtual Machine Scale Set. Performing updates in batches ensures that your scale set maintains a set number of instances available to take traffic, meaning you don't need to take down your entire workload to make a change.

Rolling upgrade policy is best suited for production workloads.

Note

To configure MaxSurge upgrades, register the following feature flag:
Register-AzProviderFeature -FeatureName MaxSurgeRollingUpgrade -ProviderNamespace Microsoft.Compute

To update the image reference version during an upgrade, register the following feature flag:
Register-AzProviderFeature -FeatureName ImageReferenceUpgradeForVmoVMs -ProviderNamespace Microsoft.Compute

Requirements

  • When using a rolling upgrade policy on Virtual Machine Scale Sets with Flexible Orchestration, the scale set must also use the application health extension to monitor application health.

  • When using a rolling upgrade policy on Virtual Machine Scale Sets with Uniform Orchestration, the scale set must also have a health probe or use the application health extension to monitor application health.

  • If attaching a virtual machine to a Virtual Machine Scale Set, the attached virtual machine should also have an application health extension. If an attached virtual machine does not have a health extension configured and reporting application health, the rolling upgrade may be impacted.

Concepts

Upgrade policy mode vs rolling upgrade policy

The upgrade policy mode and the rolling upgrade policy are nested, but separate settings of the upgrade policy. This means that while your scale set may use an automatic or manual upgrade policy mode, there is still a rolling upgrade policy that is associated. This rolling upgrade policy comes with default settings but can also be configured to meet your business needs.

"properties": {
        "upgradePolicy": {
            "mode": "<Automatic, Manual, Rolling>",
            "rollingUpgradePolicy": {
            "maxBatchInstancePercent": 20,
            "maxUnhealthyInstancePercent": 20,
            "maxUnhealthyUpgradedInstancePercent": 20,
            "pauseTimeBetweenBatches": "PT2S",
            "MaxSurge": "false"
            }
        }
    }

For scale set model updates, the upgrade policy mode will determine how the instances in the scale set are brought up to date.

Important

If you are using automatic OS image upgrades or automatic extension upgrades, these feature don't use the upgrade policy mode. Instead these features use the rolling upgrade policy configuration settings to perform rolling upgrades.

Rolling upgrade policy configuration settings

Setting Description
Rolling upgrade batch size % Specifies how many of the total instances of your scale set you want to be upgraded at one time.

Example: A batch size of 20% when you have 10 instances in your scale set results in upgrade batches with two instances each.
Pause time between batches (sec) Specifies how long you want your scale set to wait between finishing an upgrade batch and starting a new one.

Example: A pause time of 10 seconds means that once a batch is successfully completed, the scale set will wait 10 seconds before moving onto the next batch.
Max unhealthy instance % Specifies the total number of instances allowed to be marked as unhealthy before and during the rolling upgrade.

Example: A max unhealthy instance % of 20 means if you have a scale set of 10 instances and more than two instances in the entire scale set report back as unhealthy, the rolling upgrade stops.
Max unhealthy upgrade % Specifies the total number of instances allowed to be marked as unhealthy after being upgraded.

Example: A max unhealthy upgrade % of 20 means if you have a scale set of 10 instances and more than two instances in the entire scale set report back as unhealthy after being upgraded, the rolling upgrade is canceled.
Prioritize unhealthy instances Tells the scale set to upgrade instances reporting as unhealthy before upgrading instances reporting as healthy.

Example: If some instances in your scale are failed or unhealthy when a rolling upgrade begins, the scale set updates those instances first.
Enable cross-zone upgrade Allows the scale set to ignore Availability Zone boundaries when determining batches. This essentially lets the rolling upgrade treat your scale set as a regional deployment instead of a zonal deployment.
MaxSurge New instances are created in batches using the latest scale model. Once the batch of new instances is successfully created and marked as healthy, the scale set then deletes instances in batches matching the old scale set model. This continues until all instances are brought up-to-date. rolling upgrades with MaxSurge can help improve service uptime during upgrade events.

For more information see MaxSurge rolling upgrades.
Rollback Failed Instances On Policy Breach (Uniform Orchestration only) Rollback failed instances to previous model if the rolling upgrade policy is violated.

Setting or updating the rolling upgrade policy

Rolling upgrade policy can be configured during scale set creation. Because rolling upgrade policy requires successfully monitoring application health and there are specific settings that determine how upgrades are completed, it's suggested to first create your scale set using manual upgrade mode. Once you have confirmed the application health is being successfully reported, update your upgrade policy from manual to rolling.

Select the Virtual Machine Scale Set you want to change the upgrade policy for. In the menu under Settings, select Upgrade Policy and from the drop-down menu, select Rolling - Upgrades roll out in batches with optional pause.

Screenshot showing changing the upgrade policy and enabling MaxSurge in the Azure portal.

Get rolling upgrade Status

When a rolling upgrade is triggered in the Azure portal, a banner appears at the top of your scale set resource informing you that a rolling upgrade is in progress. You can click on view details to see the status of the rolling upgrade. When all updates are complete, the banner disappears.

Screenshot showing banner when a rolling upgrade is taking place.

Screenshot showing details of the rolling upgrade in the Azure portal.

Additionally, you can view exactly what changes are being rolled out in the Activity Log. The rolling upgrade event is under Create or Update Virtual Machine Scale Set. Select Change History to review what is being updated.

Screenshot showing the rolling upgrade details in the Activity Log.

Cancel a rolling upgrade

You can cancel a rolling upgrade in progress using the Azure portal by selecting the view details in the banner above your scale set. In the pop-up window, you can view the current status and at the bottom is a cancel upgrade option.

Screenshot showing the rolling upgrade details in the Activity Log.

Restart a rolling upgrade

If you decide to cancel a rolling upgrade or the upgrade has stopped due to any policy breach, any more changes that result in another scale set model change trigger a new rolling upgrade. If you want to restart a rolling upgrade, trigger a generic model update. This tells the scale set to check if all the instances are up to date with the latest model.

To restart a rolling upgrade after it has been canceled, trigger the scale set to check if the instances in the scale set are up to date with the latest scale set model. You can do this by running az vmss update.

az vmss update \
    --name myScaleSet \
    --resource-group myResourceGroup

Next steps

Learn how to perform manual upgrades on Virtual Machine Scale Sets.