Reliability in Azure Logic Apps

This article describes reliability support in Azure Logic Apps, covering intra-regional resiliency via availability zones and multi-region deployments.

Resiliency is a shared responsibility between you and Microsoft, and so this article also covers ways for you to create a resilient solution that meets your needs.

Logic app workflows help you more easily integrate and orchestrate data between apps, cloud services, and on-premises systems by reducing how much code that you have to write. When you plan for resiliency, make sure that you consider not just your logic apps, but also these Azure resources that you use with your logic apps:

Multitenant Azure Logic Apps automatically manages the compute infrastructure and resources for Consumption workflows. You don't need to configure or manage any virtual machines (VMs). Consumption workflows share compute infrastructure between many customers.

Single-tenant Azure Logic Apps runs Standard workflows on dedicated compute resources, which are dedicated to you and are called plans. Each plan can have multiple instances, and those instances can optionally be spread across multiple availability zones. Your workflows run on instances of your plan.

Production deployment recommendations

For enterprise and secure workflows with isolation or network security requirements, we recommended that you create and run Standard workflows in single-tenant Azure Logic Apps, rather than Consumption workflows in multitenant Azure Logic Apps. For more information, see Create and deploy to different environments.

For production deployments with single-tenant Azure Logic Apps, you should enable zone redundancy to spread your logic app resources across multiple availability zones.

Transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. They correct themselves after a short period of time. It's important that your applications handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow Azure's transient fault handling guidance when communicating with any cloud-hosted APIs, databases, and other components. To learn more about handling transient faults, see Recommendations for handing transient faults.

In Azure Logic Apps, many triggers and actions automatically support retry policies, which automatically retry requests that fail due to transient faults. To learn how to change or disable retry policies for your logic app, see Handle errors and exceptions in Azure Logic Apps.

If an action fails, you can customize the behavior of subsequent actions. You can also create scopes to group related actions that might fail or succeed together.

For more information on fault handling in Azure Logic Apps, see Handle errors and exceptions in Azure Logic Apps.

Availability zone support

Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.

For more information on availability zones in Azure, see What are availability zones?.

Azure Logic Apps supports zone redundancy, which spreads compute resources across multiple availability zones. When you distribute logic app workload resources across availability zones, you improve resiliency and reliability for your production logic app workloads.

New and existing Consumption logic app workflows in multitenant Azure Logic Apps automatically have zone redundancy enabled.

For Standard workflows with the Workflow Service Plan hosting option in single-tenant Azure Logic Apps, you can optionally enable zone redundancy.

For Standard workflows with the App Service Environment v3 hosting option, you can optionally enable zone redundancy. For more information on how App Service Environments v3 supports availability zones, see Reliability in App Service.

Regions supported

Consumption logic apps that are deployed in any region that supports availability zones are automatically zone redundant. Japan West is the exception, which currently doesn't support zone-redundant logic apps because some dependency services don't yet support zone redundancy.

You can deploy zone-redundant Standard logic apps with Workflow Service Plans in any region that supports availability zones for Azure App Service. Japan West is the exception, which currently doesn't support zone-redundant logic apps. For more information, see Reliability in Azure App Service.

To see which regions support availability zones for App Service Environment v3, see Regions.

Requirements

You must deploy at least three instances of your Workflow Service Plan. Each instance roughly corresponds to one VM. To distribute these instances (VMs) across availability zones, you must have a minimum of three instances.

Considerations

  • Storage: When you configure external storage for stateful Standard workflows, you must configure your storage account for zone redundancy. For more information, see Storage considerations for Azure Functions.
  • Connectors: Built-in connectors are automatically zone redundant when your logic app is zone redundant.

  • Integration accounts: Premium SKU integration accounts are zone redundant by default.

Cost

No additional cost applies to use zone redundancy, which is automatically enabled for new and existing Consumption workflows in multitenant Azure Logic Apps.

When you have Standard workflows with the Workflow Service Plan in single-tenant Azure Logic Apps, no additional cost applies to enabling availability zones as long as you have three or more instances of the plan. You are charged based on your plan SKU, the specified capacity, and any instances that you scale up or down, based on your autoscale criteria. If you enable availability zones but specify a capacity of fewer than three instances, the platform enforces the minimum three instances and charges you for these three instances.

App Service Environment v3 has a specific pricing model for zone redundancy. For pricing information for App Service Environment v3, see Pricing.

Configure availability zone support

Consumption logic app workflows automatically support zone redundancy, so no configuration is required.

  • Create a new workflow with zone redundancy.

    To enable zone redundancy for Standard logic app workflows, see Enable zone redundancy for your logic app.

  • Migration

    You can't enable zone redundancy after you create a service plan. Instead, you need to create a new plan with zone redundancy enabled and delete the old one.

  • Disable zone redundancy.

    You can't disable zone redundancy after you create a Workflow Service Plan. Instead, you need to create a new plan with zone redundancy disabled and delete the old one.

Capacity planning and management

To prepare for availability zone failure, consider over-provisioning the capacity of your service. Over-provisioning allows the solution to tolerate some degree of capacity loss and still continue to function without degraded performance.

To find out how many instances to over-provision, it's important to know that the platform spreads instances across multiple zones. You need to account for at least the failure of one zone.

Follow these steps to find out the total number of instances you should provision:

  1. Determine the number of instances your peak workload requires. In this example, we use two scenarios. One is with 3 instances and one is with 4.
  2. Retrieve the over-provision instance count by multiplying the peak workload instance count by a factor of [(zones/(zones-1)].
  3. Round the result to the nearest whole number.

Note

The following table assumes that you're using three availability zones. If you use a different number of availability zones, adjust the formula accordingly.

Peak workload instance count Factor of [(zones/(zones-1)] Formula Instances to provision (Rounded)
3 3/2 or 1.5 (3 x 1.5 = 4.5) 5 instances
4 3/2 or 1.5 (4 x 1.5 = 6) 6 instances

Traffic routing between zones

During normal operations, workflow invocations can use compute resources in any of the availability zones within the region.

During normal operations, workflow invocations are spread among all your available plan instances across all availability zones.

Zone-down experience

Detection and response: The Azure Logic Apps platform is responsible for detecting a failure in an availability zone. You don't need to do anything to initiate a zone failover.

Active requests: If an availability zone becomes unavailable, any in-progress workflow executions that run on a VM in the faulty availability zone are terminated. The Azure Logic Apps platform automatically resumes the workflow on another VM in a different availability zone. Due to this behavior, active workflows might experience some transient faults or higher latency as new VMs are added to the remaining availability zones.

Failback

When the availability zone recovers, Azure Logic Apps automatically restores instances in the availability zone, removes any temporary instances created in the other availability zones, and reroutes traffic between your instances as normal.

Testing for zone failures

The Azure Logic Apps platform manages traffic routing, failover, and failback for zone-redundant logic app resources. You don't need to initiate anything. This feature is fully managed, so you don't need to validate availability zone failure processes.

Multi-region support

Each logic app is deployed into a single Azure region. If the region becomes unavailable, your logic app is also unavailable.

Alternative multi-region approaches

For higher resiliency, you can deploy a standby or backup logic app in a secondary region and fail over to that other region if the primary region is unavailable. To enable this capability, complete the following tasks:

  • Deploy your logic app in both primary and secondary regions.
  • Reconfigure connections to resources as needed.
  • Configure load balancing and failover policies.
  • Plan to monitor the primary instance health and initiate failover.

For more information on multi-region deployments for your logic app workflows, see the following documentation:

Service-level agreement

The service-level agreement (SLA) for Azure Logic Apps describes the expected availability of the service. This agreement also describes the conditions to meet for achieving this expectation. To understand these conditions, make sure that you review the Service Level Agreements (SLA) for Online Services.