Reliability in Azure Storage Mover
This article describes reliability support in Azure Storage Mover and covers both intra-regional resiliency with availability zones and cross-region disaster recovery and business continuity. For a more detailed overview of reliability principles in Azure, see Azure reliability.
Availability zone support
Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.
For more information on availability zones in Azure, see What are availability zones?
Azure Storage Mover supports a zone-redundant deployment model.
When you deploy an Azure Storage Mover resource, you must select a particular region in which the resource's instance metadata is stored.
If the region supports availability zones, the instance metadata is automatically replicated across multiple availability zones within that region.
Important
Azure Storage Mover instance metadata includes projects, endpoints, agents, job definitions, and job run history, but doesn't include the actual data to be migrated. Azure storage accounts that are used as migration targets have their own reliability support.
Prerequisites
To deploy with availability zone support, you must choose a region that supports availability zones. To see which regions supports availability zones, see the list of supported regions.
(Optional) If your target storage account doesn't support availability zones, and you would like to migrate the account to AZ support, see Migrate Azure Storage accounts to availability zone support.
Zone down experience
During a zone-wide outage, no action is required during zone recovery. Azure Storage Mover is designed to self-heal and re-balance itself to take advantage of the healthy zone automatically.
Any migration target storage account may require its own recovery steps. This requirement depends on the redundancy options chosen for each storage account. See the storage account disaster recovery guide to determine whether more steps are necessary.
If a local storage was chosen in lieu of redundancy options, you may need to create a new storage account for use in migrations during the outage.
Cross-region disaster recovery and business continuity
Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.
When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you're responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.
When a Storage Mover agent is registered, it connects to the region in which the Storage Mover resource is registered. If an agent's Azure region experiences an outage, the agent itself isn't affected, but management operations that rely on Azure may be unable to complete. In addition, any active data migrations to storage accounts located within the affected region may fail.
Storage Mover supports two forms of disaster recovery:
Important
Disaster recovery for on-premises data sources is the responsibility of the customer.
Azure initiated disaster recovery
Azure initiated disaster recovery is only applicable to those regions that have region pairs. When cross-region replication is utilized, instance metadata is replicated to each region, but is never permitted to leave the geography.
Azure Storage Mover uses Cosmos DB for storing instance metadata. Data loss may occur only with an unrecoverable disaster in the Azure Cosmos DB . For more information, see Region outages. Azure initiated recovery is active-passive, and full recovery of a region may be up to 24 hours.
Customer initiated disaster recovery
Customer initiated disaster recovery isn't restricted to paired regions.
Before a regional outage occurs:
Deploy a zone-redundant Storage Mover by creating Storage Mover resources in a region that supports availability zones.
Periodically - either on a schedule or after you make substantial changes - take a snapshot of your Storage Mover resources. Storing the snapshots using a version control system is a good way to store and track history of the snapshots. You'll use the last good snapshot in the event of a disaster where you need to recover your resources in a new region.
During a regional outage:
You can do one of two things:
- Choose to wait for Azure to recover the region.
- Minimize downtime by redeploying your resources to a different region. Since access to your resources may be impacted during an outage, you'll want to use the last good snapshot of your resources.
Tip
Either one of these strategies still may require that you need to take further steps prior to a disaster, so be sure to review and plan accordingly.
Deploy resources to a different region
See the documentation on exporting templates for further instructions on exporting resources as an Azure Resource Manager (ARM) template.
If your Storage Mover and related resources reside in a container with no extra resources, you should perform a Resource Group export to capture the current state. However, if your resource group contains unrelated resources, you may need to remove or otherwise exclude the resources from the template.
Existing agents can't be redeployed to a different region. If the region in which they were originally configured experiences an outage, it may not be possible to completely unregister and re-register the agent. This document assumes that new agents are registered within a new region.
To use the exported template for disaster recovery, a few changes to the template are required.
- First, remove any
Microsoft.StorageMover/agents
andMicrosoft.HybridCompute/machines
resources from the template. Be sure to remove any dependency references to these resources as well. - Next, remove the
agentResourceId
property from all job definitions. You'll need to assign them to a new Agent after deployment. - After removing all references to agent and Hybrid Compute machine resources, update the location property of the top level Storage Mover resource. Replace the name of the currently deployed region with the name of the new region.
- Finally, determine whether to keep the existing storage account resource ID. If necessary, replace it with a different storage account.
After completing the previous steps and verifying that the template parameters are correct, the template is ready for deployment to a new region. You should deploy the template to a new resource group that has the same default region as the location property in the template.
Registering the new agent
Follow the steps within the deploy an Azure Storage Mover agent article to register a new agent in the new Storage Mover resource.
Assigning the agent to job definitions
After the new agent has been registered and reports as online, use the Azure portal or PowerShell to associate the existing job definitions to the new agent. The following PowerShell example is provided for convenience.
See the define a new migration job for guidance on how to access the job definitions for your project.
## Update the agent in a job definition resource
$resourceGroupName = "[Your resource group name]"
$storageMoverName = "[Your storage mover name]"
$projectName = "[Your project name]"
$jobDefName = "[Your job definition name]"
$agentName = "[The name of an agent previously registered to the same storage mover resource]"
Update-AzStorageMoverJobDefinition `
-ResourceGroupName $resourceGroupName `
-StorageMoverName $storageMoverName `
-ProjectName $projectName `
-Name $jobDefName `
-AgentName $agentName
Granting agent access to the target storage container
You need to assign the data contributor role to the managed identity to successfully perform a migration job. Assign the Hybrid Compute resource's system managed identity access to the target storage account resource. The assign a managed identity access to a resource article provides guidance on how to grant access to the target resource.
You're now ready to start migration jobs using the newly deployed Storage Mover resources.