Dapr component resiliency (preview)
Resiliency policies proactively prevent, detect, and recover from your container app failures. In this article, you learn how to apply resiliency policies for applications that use Dapr to integrate with different cloud services, like state stores, pub/sub message brokers, secret stores, and more.
You can configure resiliency policies like retries, timeouts, and circuit breakers for the following outbound and inbound operation directions via a Dapr component:
- Outbound operations: Calls from the Dapr sidecar to a component, such as:
- Persisting or retrieving state
- Publishing a message
- Invoking an output binding
- Inbound operations: Calls from the Dapr sidecar to your container app, such as:
- Subscriptions when delivering a message
- Input bindings delivering an event
The following screenshot shows how an application uses a retry policy to attempt to recover from failed requests.
Supported resiliency policies
Configure resiliency policies
You can choose whether to create resiliency policies using Bicep, the CLI, or the Azure portal.
The following resiliency example demonstrates all of the available configurations.
resource myPolicyDoc 'Microsoft.App/managedEnvironments/daprComponents/resiliencyPolicies@2023-11-02-preview' = {
name: 'my-component-resiliency-policies'
parent: '${componentName}'
properties: {
outboundPolicy: {
timeoutPolicy: {
responseTimeoutInSeconds: 15
}
httpRetryPolicy: {
maxRetries: 5
retryBackOff: {
initialDelayInMilliseconds: 1000
maxIntervalInMilliseconds: 10000
}
}
circuitBreakerPolicy: {
intervalInSeconds: 15
consecutiveErrors: 10
timeoutInSeconds: 5
}
}
inboundPolicy: {
timeoutPolicy: {
responseTimeoutInSeconds: 15
}
httpRetryPolicy: {
maxRetries: 5
retryBackOff: {
initialDelayInMilliseconds: 1000
maxIntervalInMilliseconds: 10000
}
}
circuitBreakerPolicy: {
intervalInSeconds: 15
consecutiveErrors: 10
timeoutInSeconds: 5
}
}
}
}
Important
Once you've applied all the resiliency policies, you need to restart your Dapr applications.
Policy specifications
Timeouts
Timeouts are used to early-terminate long-running operations. The timeout policy includes the following properties.
properties: {
outbound: {
timeoutPolicy: {
responseTimeoutInSeconds: 15
}
}
inbound: {
timeoutPolicy: {
responseTimeoutInSeconds: 15
}
}
}
Metadata | Required | Description | Example |
---|---|---|---|
responseTimeoutInSeconds |
Yes | Timeout waiting for a response from the Dapr component. | 15 |
Retries
Define an httpRetryPolicy
strategy for failed operations. The retry policy includes the following configurations.
properties: {
outbound: {
httpRetryPolicy: {
maxRetries: 5
retryBackOff: {
initialDelayInMilliseconds: 1000
maxIntervalInMilliseconds: 10000
}
}
}
inbound: {
httpRetryPolicy: {
maxRetries: 5
retryBackOff: {
initialDelayInMilliseconds: 1000
maxIntervalInMilliseconds: 10000
}
}
}
}
Metadata | Required | Description | Example |
---|---|---|---|
maxRetries |
Yes | Maximum retries to be executed for a failed http-request. | 5 |
retryBackOff |
Yes | Monitor the requests and shut off all traffic to the impacted service when timeout and retry criteria are met. | N/A |
retryBackOff.initialDelayInMilliseconds |
Yes | Delay between first error and first retry. | 1000 |
retryBackOff.maxIntervalInMilliseconds |
Yes | Maximum delay between retries. | 10000 |
Circuit breakers
Define a circuitBreakerPolicy
to monitor requests causing elevated failure rates and shut off all traffic to the impacted service when a certain criteria is met.
properties: {
outbound: {
circuitBreakerPolicy: {
intervalInSeconds: 15
consecutiveErrors: 10
timeoutInSeconds: 5
}
},
inbound: {
circuitBreakerPolicy: {
intervalInSeconds: 15
consecutiveErrors: 10
timeoutInSeconds: 5
}
}
}
Metadata | Required | Description | Example |
---|---|---|---|
intervalInSeconds |
No | Cyclical period of time (in seconds) used by the circuit breaker to clear its internal counts. If not provided, the interval is set to the same value as provided for timeoutInSeconds . |
15 |
consecutiveErrors |
Yes | Number of request errors allowed to occur before the circuit trips and opens. | 10 |
timeoutInSeconds |
Yes | Time period (in seconds) of open state, directly after failure. | 5 |
Circuit breaker process
Specifying consecutiveErrors
(the circuit trip condition as
consecutiveFailures > $(consecutiveErrors)-1
) sets the number of errors allowed to occur before the circuit trips and opens halfway.
The circuit waits half-open for the timeoutInSeconds
amount of time, during which the consecutiveErrors
number of requests must consecutively succeed.
- If the requests succeed, the circuit closes.
- If the requests fail, the circuit remains in a half-opened state.
If you didn't set any intervalInSeconds
value, the circuit resets to a closed state after the amount of time you set for timeoutInSeconds
, regardless of consecutive request success or failure. If you set intervalInSeconds
to 0
, the circuit never automatically resets, only moving from half-open to closed state by successfully completing consecutiveErrors
requests in a row.
If you did set an intervalInSeconds
value, that determines the amount of time before the circuit is reset to closed state, independent of whether the requests sent in half-opened state succeeded or not.
Resiliency logs
From the Monitoring section of your container app, select Logs.
In the Logs pane, write and run a query to find resiliency via your container app system logs. For example, to find whether a resiliency policy was loaded:
ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Loading Resiliency configuration:"
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc
Click Run to run the query and view the result with the log message indicating the policy is loading.
Or, you can find the actual resiliency policy by enabling debug logs on your container app and querying to see if a resiliency resource is loaded.
Once debug logs are enabled, use a query similar to the following:
ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| where Log_s contains "Resiliency configuration ("
| project time_t, Category, ContainerAppName_s, Log_s
| order by time_t desc
Click Run to run the query and view the resulting log message with the policy configuration.
Related content
See how resiliency works for Service to service communication using Azure Container Apps built in service discovery