Dapr component resiliency (preview)

Resiliency policies proactively prevent, detect, and recover from your container app failures. In this article, you learn how to apply resiliency policies for applications that use Dapr to integrate with different cloud services, like state stores, pub/sub message brokers, secret stores, and more.

You can configure resiliency policies like retries, timeouts, and circuit breakers for the following outbound and inbound operation directions via a Dapr component:

  • Outbound operations: Calls from the Dapr sidecar to a component, such as:
    • Persisting or retrieving state
    • Publishing a message
    • Invoking an output binding
  • Inbound operations: Calls from the Dapr sidecar to your container app, such as:
    • Subscriptions when delivering a message
    • Input bindings delivering an event

The following screenshot shows how an application uses a retry policy to attempt to recover from failed requests.

Diagram demonstrating resiliency for container apps with Dapr components.

Supported resiliency policies

Configure resiliency policies

You can choose whether to create resiliency policies using Bicep, the CLI, or the Azure portal.

The following resiliency example demonstrates all of the available configurations.

resource myPolicyDoc 'Microsoft.App/managedEnvironments/daprComponents/resiliencyPolicies@2023-11-02-preview' = {
  name: 'my-component-resiliency-policies'
  parent: '${componentName}'
  properties: {
    outboundPolicy: {
      timeoutPolicy: {
          responseTimeoutInSeconds: 15
      }
      httpRetryPolicy: {
          maxRetries: 5
          retryBackOff: {
            initialDelayInMilliseconds: 1000
            maxIntervalInMilliseconds: 10000
          }
      }
      circuitBreakerPolicy: {  
          intervalInSeconds: 15
          consecutiveErrors: 10
          timeoutInSeconds: 5     
      }  
    } 
    inboundPolicy: {
      timeoutPolicy: {
        responseTimeoutInSeconds: 15
      }
      httpRetryPolicy: {
        maxRetries: 5
        retryBackOff: {
          initialDelayInMilliseconds: 1000
          maxIntervalInMilliseconds: 10000
        }
      }
      circuitBreakerPolicy: {  
          intervalInSeconds: 15
          consecutiveErrors: 10
          timeoutInSeconds: 5     
      }  
    }
  }
}

Important

Once you've applied all the resiliency policies, you need to restart your Dapr applications.

Policy specifications

Timeouts

Timeouts are used to early-terminate long-running operations. The timeout policy includes the following properties.

properties: {
  outbound: {
    timeoutPolicy: {
        responseTimeoutInSeconds: 15
    }
  }
  inbound: {
    timeoutPolicy: {
        responseTimeoutInSeconds: 15
    }
  }
}
Metadata Required Description Example
responseTimeoutInSeconds Yes Timeout waiting for a response from the Dapr component. 15

Retries

Define an httpRetryPolicy strategy for failed operations. The retry policy includes the following configurations.

properties: {
  outbound: {
    httpRetryPolicy: {
        maxRetries: 5
        retryBackOff: {
          initialDelayInMilliseconds: 1000
          maxIntervalInMilliseconds: 10000
        }
    }
  }
  inbound: {
    httpRetryPolicy: {
        maxRetries: 5
        retryBackOff: {
          initialDelayInMilliseconds: 1000
          maxIntervalInMilliseconds: 10000
        }
    }
  } 
}
Metadata Required Description Example
maxRetries Yes Maximum retries to be executed for a failed http-request. 5
retryBackOff Yes Monitor the requests and shut off all traffic to the impacted service when timeout and retry criteria are met. N/A
retryBackOff.initialDelayInMilliseconds Yes Delay between first error and first retry. 1000
retryBackOff.maxIntervalInMilliseconds Yes Maximum delay between retries. 10000

Circuit breakers

Define a circuitBreakerPolicy to monitor requests causing elevated failure rates and shut off all traffic to the impacted service when a certain criteria is met.

properties: {  
  outbound: {  
    circuitBreakerPolicy: {  
        intervalInSeconds: 15
        consecutiveErrors: 10
        timeoutInSeconds: 5     
    }  
  },  
  inbound: {  
    circuitBreakerPolicy: {  
        intervalInSeconds: 15
        consecutiveErrors: 10
        timeoutInSeconds: 5     
    }  
  }  
}
Metadata Required Description Example
intervalInSeconds No Cyclical period of time (in seconds) used by the circuit breaker to clear its internal counts. If not provided, the interval is set to the same value as provided for timeoutInSeconds. 15
consecutiveErrors Yes Number of request errors allowed to occur before the circuit trips and opens. 10
timeoutInSeconds Yes Time period (in seconds) of open state, directly after failure. 5

Circuit breaker process

Specifying consecutiveErrors (the circuit trip condition as consecutiveFailures > $(consecutiveErrors)-1) sets the number of errors allowed to occur before the circuit trips and opens halfway.

The circuit waits half-open for the timeoutInSeconds amount of time, during which the consecutiveErrors number of requests must consecutively succeed.

  • If the requests succeed, the circuit closes.
  • If the requests fail, the circuit remains in a half-opened state.

If you didn't set any intervalInSeconds value, the circuit resets to a closed state after the amount of time you set for timeoutInSeconds, regardless of consecutive request success or failure. If you set intervalInSeconds to 0, the circuit never automatically resets, only moving from half-open to closed state by successfully completing consecutiveErrors requests in a row.

If you did set an intervalInSeconds value, that determines the amount of time before the circuit is reset to closed state, independent of whether the requests sent in half-opened state succeeded or not.

Resiliency logs

From the Monitoring section of your container app, select Logs.

Screenshot demonstrating where to find the logs for your container app using Dapr component resiliency.

In the Logs pane, write and run a query to find resiliency via your container app system logs. For example, to find whether a resiliency policy was loaded:

ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Loading Resiliency configuration:"
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc

Click Run to run the query and view the result with the log message indicating the policy is loading.

Screenshot showing resiliency query results based on provided query example for checking if resiliency policy has loaded.

Or, you can find the actual resiliency policy by enabling debug logs on your container app and querying to see if a resiliency resource is loaded.

Screenshot demonstrating how to enable debug logs on your container app via the portal.

Once debug logs are enabled, use a query similar to the following:

ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Resiliency configuration ("
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc

Click Run to run the query and view the resulting log message with the policy configuration.

Screenshot showing resiliency query results based on provided query example for finding the actual resiliency policy.

See how resiliency works for Service to service communication using Azure Container Apps built in service discovery