Reliability in Azure Event Hubs

This article describes reliability support in Azure Event Hubs, and covers both intra-regional resiliency with availability zones and cross-region disaster recovery and business continuity. For a more detailed overview of reliability principles in Azure, see Azure reliability.

Availability zone support

Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.

For more information on availability zones in Azure, see What are availability zones?.

Event Hubs implements transparent failure detection and failover mechanisms so that, when failure occurs, the service continues to operate within the assured service-levels and without noticeable interruptions. If you create an Event Hubs namespace in a region that supports availability zones, zone redundancy is automatically enabled. With zone-redundancy, fault tolerance is increased and the service has enough capacity reserves to cope with the outage of an entire facility. Both metadata and data (events) are replicated across data centers in each zone.

Prerequisites

Availability zone support is only available in Azure regions with availability zones.

Create a resource with availability zones enabled

When you use the Azure portal, zone redundancy is automatically enabled. When you create a namespace, you see the following highlighted message when you select a region that supports availability zones.

Screenshot showing the Create Namespace page with a region that has availability zones.

Disable availability zones

The Azure portal doesn't support disabling availability zones. To disable availability zones, use one of the following methods:

Availability zone migration

When you create availability zones in a region that supports them, availability zones are automatically enabled. If you wish to learn how to move your Event Hubs namespace to a new region that supports availability zones, see Relocate Event Hubs to another region.

Cross-region disaster recovery and business continuity

Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR. Before you begin to think about creating your disaster recovery plan, see Recommendations for designing a disaster recovery strategy.

When it comes to DR, Microsoft uses the shared responsibility model. In a shared responsibility model, Microsoft ensures that the baseline infrastructure and platform services are available. At the same time, many Azure services don't automatically replicate data or fall back from a failed region to cross-replicate to another enabled region. For those services, you're responsible for setting up a disaster recovery plan that works for your workload. Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR and you can use service-specific features to support fast recovery to help develop your DR plan.

The all-active Azure Event Hubs cluster model with availability zone support provides resiliency against hardware and datacenter outages. However, if a disaster where an entire region and all zones are unavailable, you can use Geo-disaster recovery to recover your workload and application configuration.

There are two features that provide geo-disaster recovery in Azure Event Hubs.

  • Geo-disaster recovery (Metadata DR), which just provides replication of only metadata.

    Geo-Disaster recovery ensures that the entire configuration of a namespace (Event Hubs, Consumer Groups, and settings) is continuously replicated from a primary namespace to a secondary namespace when paired.

    The Geo-disaster recovery feature of Azure Event Hubs is a disaster recovery solution. The concepts and workflow described in this article apply to disaster scenarios, and not to temporary outages. For a detailed discussion of disaster recovery in Microsoft Azure, see this article.

    With Geo-Disaster recovery, you can initiate a once-only failover move from the primary to the secondary at any time. The failover move points the chosen alias name for the namespace to the secondary namespace. After the move, the pairing is then removed. The failover is nearly instantaneous once initiated.

    For detailed information, samples, and further documentation, on Geo-Disaster recovery in Event Hubs, see Azure Event Hubs - Geo-disaster recovery.

  • Geo-replication (public preview), which provides replication of both metadata and data, replicates configuration information and all of the data from a primary namespace to one, or more secondary namespaces. When a failover is performed, the selected secondary becomes the primary and the previous primary becomes a secondary. Users can perform a failover back to the original primary when desired.

    For detailed information, samples, and further documentation, on Geo-replication in Event Hubs, see Geo-replication .

Next steps