Reliability in Azure Device Registry
This article describes reliability support in Azure Device Registry. It covers both intra-regional resiliency with availability zones and information on multi-region deployments.
Because resiliency is a shared responsibility between you and Microsoft, this article also covers ways for you to build a resilient solution that meets your needs.
Transient faults
Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. They correct themselves after a short period of time. It's important that your applications handle transient faults, usually by retrying affected requests.
All cloud-hosted applications should follow Azure's transient fault handling guidance when communicating with any cloud-hosted APIs, databases, and other components. To learn more about handling transient faults, see Recommendations for handing transient faults.
Availability zone support
Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.
For more information on availability zones in Azure, see What are availability zones?.
Azure Device Registry is zone-redundant, which means that it automatically replicates across multiple availability zones. This setup enhances the resiliency of the service by providing high availability. If there's a failure in one zone, the service can continue to operate seamlessly from another zone.
Microsoft manages setup and configuration for zone redundancy in Azure Device Registry. You don't need to perform any more configuration to enable this zone redundancy. Microsoft ensures that the service is configured to provide the highest level of availability and reliability.
Regions supported
The following list of regions support availability zones in Azure Device Registry:
Americas | Europe | Middle East | Africa | Asia Pacific |
---|---|---|---|---|
East US | North Europe | |||
East US 2 | West Europe | |||
West US 2 | ||||
West US 3 |
Cost
There's no extra cost to use zone redundancy for Azure Device Registry.
Configure availability zone support
New resources: When you create an Azure Device Registry resource in Azure IoT Operations, it automatically includes zone-redundancy by default. There's no need for you to perform any more configuration.
Zone-down experience
During a zone-wide outage, you don't need to take any action to fail over to a healthy zone. The service automatically self-heals and rebalances itself to take advantage of the healthy zone automatically.
Detection and response: Because Azure Device Registry detects and responds automatically to failures in an availability zone, you don't need to do anything to initiate an availability zone failover.
Multi-region support
Azure Device Registry is a regional service with automatic geographical data replication. In a region-wide outage, Microsoft initiates compute failover from one region to another. If Azure Device Registry fails over, it continues to support its primary region, and no more actions by you're required.
When using Azure IoT Operations (Azure IoT Operations), Azure Device Registry projects assets as Azure resources in the cloud within a single registry. The single registry is a source of truth for asset metadata and asset management capabilities. However, Azure IoT Operations includes various other components beyond Azure Device Registry. For detailed information on the high availability and zero data loss features of Azure IoT Operations components, refer to Azure IoT Operations frequently asked questions.
Region down experience
During a region outage, Microsoft adheres to the Recovery Time Objective (RTO) to recover the service. During this time, the customer can expect some service interruption until the service is fully recovered.
In a complete region loss scenario, you can expect a manual recovery from Microsoft.
For Azure Device Registry, Recovery Time Objective (RTO) is approximately 24 hours. For Recovery Point Objective (RPO), you can expect less than 15 minutes.
Service-level agreement (SLA)
The service-level agreement (SLA) for Azure Device Registry describes the expected availability of the service, and the conditions that must be met to achieve that availability expectation. To understand those conditions, it's important that you review the Service Level Agreements (SLA) for Online Services.