Retry Logic for Transient Failures in Windows Azure SQL Database

Article
1/17/2024

This tutorial shows how to handle transient connection failures in Windows Azure SQL Database.

Note

If you wish to contribute to this page, use the Edit tab at the top (sign-in required). If you wish to provide feedback for this documentation please either send e-mail to azuredocs@microsoft.com or use the Comment field at the bottom of this page (sign-in required).

SQL Database queries can fail for various reasons – a malformed query, network issues, and so on. Some errors are transient, meaning the problem often goes away by itself. For this subset of errors, it makes sense to retry the query after a short interval. If the query still fails after several retries, you would report an error. Of course, not all errors are transient. SQL Error 102, “Incorrect syntax,” won’t go away no matter how many times you submit the same query. In short, implementing robust retry logic requires some thought.

The Windows Azure AppFabric Customer Advisory Team (CAT) together with the patterns & practices team has developed a library that encapsulates retry logic for SQL Database, Windows Azure storage, Azure Caching, and Service Bus. Using this library, you decide which errors should trigger a retry attempt, and the library handles the retry logic. This tutorial will show how to use the library with a SQL Database.

For additional guidance on using the Transient Fault Handling Application block, see this chapter and the hands-on labs.

Testing retry logic is an interesting problem, because you need to trigger transient errors in a repeatable way. Of course, you could just unplug your network cable, or block port 1433. (SQL Database uses TCP over port 1433.) But for this tutorial, I’ve opted for something that’s easier to code: Before submitting a query, hold a table-wide lock, which causes a deadlock or a timeout. When the lock is released, the original query can be retried.

Prerequisites

This tutorial assumes you have a Windows Azure Platform subscription. For more information on creating a subscription, see [[articles:Getting Started with SQL Database using the Windows Azure Platform Management Portal]].

This tutorial requires Visual Studio 2010 Professional or higher.

Download the project files.

Lesson 1: Create the Example Database

This tutorial uses a trivial database for illustration. The sample project includes a SQL script to create the database. The script file is located in the root folder of the project file, and is named CustomerOrders.sql.

To create the database on SQL Database, perform the following steps:

Open the Azure management portal (https://windows.azure.com/) and log in using your account credentials.
Select Database.
Under Server, click Create.
Go through the wizard to select a region and create a database administrator login.
Add a firewall rule with your IP address. The wizard will display your current IP address. You can enter this address for both the IP range start and IP range end.
Note the fully qualified server name for the new server.
Under Database, click Create. Name the database CustomerOrders.
Under Subscriptions, expand the node and select the CustomerOrders database.
Under Database, select Manage.
Click Open Query.
Open the file CustomerOrders.sql.
Click Execute.

Lesson 2: Add Queries

In this lesson, you will add some basic SQL queries and then trigger a transient failure. In the next lesson, you will add retry logic.

Note: If you want to get directly to the retry logic, you can skip this lesson and go to lesson 3

Set the Connection String

Download the project files and open the solution named RetryLogicTutorial_Skeleton.sln. Right-click Form1.cs and select View Code. Then add a SqlConnectionStringBuilder member variable to the Form1 class.

SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder();

In the Form1 constructor, initialize the SqlConnectionStringBuilder for the CusomerOrders database:

builder.DataSource = "xxxxxxxxxx.database.windows.net";
builder.InitialCatalog = "CustomerOrders";
builder.Encrypt = true;
builder.TrustServerCertificate = false;
builder.UserID = "xxxxxxxx";
builder.Password = "xxxxxxxx";

Substitute the actual server name, user ID, and password for your SQL Database

server.

Add an ADO.Net Query

Locate the function named AdoQueryWorker_DoWork and add the following code:

try
{
    using (SqlConnection connection = new SqlConnection(builder.ConnectionString))
    {
        connection.Open();

        SqlCommand command = new SqlCommand("select product_name from products with (READCOMMITTEDLOCK)");
        command.Connection = connection;
        command.CommandTimeout = 3;

        SqlDataReader reader = command.ExecuteReader();

        while (reader.Read())
        {
            (sender as BackgroundWorker).ReportProgress(0, reader["product_name"]);
        }
    }
}
catch (SqlException ex)
{
    MessageBox.Show(ex.Message, "SqlException");
}

This function is called from a background worker thread. The query returns a list of product names, which are passed to the UI thread by calling BackgroundWorker.ReportProgress. The READCOMMITTEDLOCK clause blocks the query if the table is locked.

Add a LINQ Query

Now we’ll add a second query using LINQ. First, create a LINQ to SQL data model for the CustomerOrders database, as follows:

Add a new data connection:

If the Server Explorer window is not visible, on the View menu, click Other Windows, then click Server Explorer.
In the Server Explorer window, right-click Data Connections and select Add Connection.
In the Add Connection dialog, enter the fully qualified name of your SQL Database server (xxxxxxxxxx.database.windows.net) in the Server name field.
Select SQL Server Authentication.
Enter your login name and password.
Under Select or enter a database name, enter "CustomerOrders".

Add a LINQ to SQL class:

In the Solution Explorer window, right-click the project and select Add New Item.
Under Data, select LINQ to SQL Classes. Name the file CustomerOrders.dbml.
In the Server Explorer window, expand the Data Connections node and locate the CustomerOrders database.
Expand the database node and then expand the Tables node.
Drag all four tables (customers, order_items, orders, and products) to the designer.

Next, locate the function named LinqQueryWorker_DoWork and add the following code:

try
{
    Deadlock();

    CustomerOrdersDataContext ctx = new CustomerOrdersDataContext();
    ctx.Connection.ConnectionString = builder.ConnectionString;

    var results = from c in ctx.customers
                    from o in c.orders
                    from i in o.order_items
                    select new { c.lname, c.fname, i.product.product_name, i.quantity };

    e.Result = results.ToList();

}
catch (SqlException ex)
{
    MessageBox.Show(ex.Message, "SqlException");
}

This code creates a LINQ query that returns all of the customer orders.

Run the application and click the “ADO.NET Query” button and the “LINQ Query” button. You should see the query results populated in the UI.

Add a Blocking Transaction

One thing that’s tricky about retry logic is actually inducing a transient error for testing. For this purpose, we will intentionally cause a deadlock. Locate the function named longTransaction_DoWork and add the following code:

System.Diagnostics.Debug.WriteLine("Starting long transaction");
try
{
    using (SqlConnection con = new SqlConnection(builder.ConnectionString))
    {
        con.Open();

        SqlTransaction transaction = con.BeginTransaction();

        SqlCommand cmd = new SqlCommand("UPDATE products WITH (TABLOCKX) SET product_name = 'x'");
        cmd.Connection = con;
        cmd.Transaction = transaction;
        cmd.ExecuteNonQuery();

        for (int i = 1; i <= LongTransactionTime; i++)
        {
            Thread.Sleep(1000);
            longTransactionWorker.ReportProgress((int)(i * 100) / LongTransactionTime);
        }

        cmd = new SqlCommand("SELECT * FROM customers with (READCOMMITTEDLOCK)");
        cmd.Connection = con;
        cmd.Transaction = transaction;
        cmd.ExecuteNonQuery();

        System.Diagnostics.Debug.WriteLine("Roll back long transaction");
        transaction.Rollback();
    }
}
catch (Exception ex)
{
    MessageBox.Show(ex.Message);
}

This function starts a new SQL transaction and holds an exclusive lock on the Products table. It sleeps for 30 seconds before rolling back the transaction. Next, locate the function named Deadlock and add the following code.

using (SqlConnection connection = new SqlConnection(builder.ConnectionString))
{
    connection.Open();
 
    SqlTransaction transaction = connection.BeginTransaction(System.Data.IsolationLevel.Serializable);
 
    SqlCommand command = new SqlCommand(
        "UPDATE customers WITH (TABLOCKX) SET lname = 'x'; " +
        "SELECT * FROM PRODUCTS with (READCOMMITTEDLOCK)"
        );
    command.Connection = connection;
    command.Transaction = transaction;
    command.ExecuteNonQuery();
 
    transaction.Rollback();
}

Run the application and click the “Deadlock” button. While the progress bar advances, click the “ADO.Net Query” or “LINQ Query” button. You should see an error message, SQL Error -2 (timeout) or SQL Error 1205 (deadlock).

Lesson 3: Add Retry Logic

In this lesson, you will modify the SQL queries from Lesson 2 to include retry logic.

Note: If you skipped lesson 2, download the tutorial files and open the Visual Studio solution named RetryLogicTutorial_Queries.sln.

First, add a reference to the CAT retry library:

Download the library from http://appfabriccat.com/2011/02/transient-fault-handling-framework/
Build the library.
In the application, go to project properties and set Target Framework to .Net Framework 4
Add a reference to Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.dll

Next, add the following using statements to Form.cs:

#using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling
#using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.SqlAzure

Add a new class named MyRetryStrategy that implements the ITransientErrorDetectionStrategy interface. This interface has a single method, IsTransient, which takes an Exception object and returns true if the exception represents a transient error:

using System;
using System.Data.SqlClient;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.SqlAzure;
 
namespace RetryLogicTutorial
{
    class MyRetryStrategy : ITransientErrorDetectionStrategy 
    {
        public bool IsTransient(Exception ex)
        {
            if (ex != null && ex is SqlException)
            {
                foreach (SqlError error in (ex as SqlException).Errors)
                { 
                    switch (error.Number)
                    {
                        case 1205:
                            System.Diagnostics.Debug.WriteLine("SQL Error: Deadlock condition. Retrying...");
                            return true;
 
                        case -2:
                            System.Diagnostics.Debug.WriteLine("SQL Error: Timeout expired. Retrying...");
                            return true;
                    }
                }
            }
 
            // For all others, do not retry.
            return false;
        }
    }
}

The implementation shown here is meant only to illustrate the use of the library. But the basic pattern is typical: First, filter for exceptions of type SqlException. Then look at the error numbers in the SqlException.Errors collection. Return true for any error that should trigger a retry, and false for all other errors.

Next, we'll modify the two queries to use MyRetryStrategy.

Add Retry Logic for ADO.NET

Modify the AdoQueryWorker_DoWork function as follows:

At the start of the function, create a new instance of the MyRetryPolicy class.
Change the SqlConnection.Open call to SqlConnection.OpenWithRetry.
Change the SqlCommand.ExecuteReader call to SqlCommand.ExecuteReaderWithRetry.

Here is the modified function:

void AdoQueryWorker_DoWork(object sender, DoWorkEventArgs e)
{
    RetryPolicy retry = new RetryPolicy<MyRetryStrategy>(5, new TimeSpan(0, 0, 5));
 
    try
    {
        using (SqlConnection connection = new SqlConnection(builder.ConnectionString))
        {
            connection.OpenWithRetry(retry);
 
            SqlCommand command = new SqlCommand("select product_name from products");
            command.Connection = connection;
            command.CommandTimeout = CommandTimeout;
 
            SqlDataReader reader = command..ExecuteReaderWithRetry(retry);
 
            while (reader.Read())
            {
                (sender as BackgroundWorker).ReportProgress(0, reader["product_name"]);
            }
        }
    }
    catch (SqlException ex)
    {
        MessageBox.Show(ex.Message, "SqlException");
    }
}

The RetryPolicy<T> class implements the retry logic. The parameter T must be a type that implements ITransientErrorDetectionStrategy. In the RetryPolicy constructor, you set the maximum number of retry attempts. Optionally, you can also set the retry interval, or use an exponential backoff.

The OpenWithRetry method is an extension method, defined in the CAT library, that adds retry logic to the standard ADO.NET SqlConnection.Open method. If an exception is thrown while opening the connection, the RetryPolicy object passes the exception to ITransientErrorDetectionStrategy::IsTransient. If IsTransient returns true, the RetryPolicy waits for the interval and then retries, up to the maximum retries. Note that this blocks the calling thread, so avoid doing this on a UI thread.

Similarly, ExecuteReaderWithRetry adds retry logic to the SqlCommand.ExecuteReader method. Extension methods are also defined for ExecuteNonQuery, ExecuteScalar, and ExecuteXmlReader. When using these methods, you should always consider whether a particular SQL command is safe to retry.

Add Retry Logic for LINQ to SQL

Direct calls to ADO.NET are fairly straightforward. But many applications use frameworks such as WCF or LINQ to SQL, which abstract the underlying database calls. For this type of scenario, the CAT retry library provides a way to wrap a block of code into a scope that can be retried as a unit. To see how this works, modify the LinqQueryWorker_DoWork functions as follows:

void LinqQueryWorker_DoWork(object sender, DoWorkEventArgs e)
{
    RetryPolicy retry = new RetryPolicy<MyRetryStrategy>(5, TimeSpan.FromSeconds(5));

    try
    {
        e.Result = retry.ExecuteAction(() =>
            {
                Deadlock(); // Artificially create a deadlock condition

                CustomerOrdersDataContext ctx = new CustomerOrdersDataContext();
                ctx.Connection.ConnectionString = builder.ConnectionString;
                ctx.CommandTimeout = 3;

                var results = from c in ctx.customers
                                from o in c.orders
                                from i in o.order_items
                                select new { c.lname, c.fname, i.product.product_name, i.quantity };

                return results.ToList();
            });
    }
    catch (SqlException ex)
    {
        MessageBox.Show(ex.Message, "SqlException");
    }
}

The RetryPolicy.ExecuteAction method takes a lambda expression. The code in the lambda expression is executed at least once. If a transient error occurs, the RetryPolicy object retries the entire code block. You should place exception handlers outside of the ExecuteAction statement, and not inside the statement, so that the retry policy gets the first look at any exceptions.

Replace MyRetryStrategy

The implementation of MyRetryStrategy that was shown here is only to demonstrate the retry API. Here are the main transient errors that you should consider in production code:

Error Number	Description
20	The instance of SQL Server does not support encryption.
64	An error occurred during login.
233	Connection initialization error.
10053	A transport-level error occurred when receiving results from the server.
10054	A transport-level error occurred when sending the request to the server.
10060	Network or instance-specific error.
40143	Connection could not be initialized.
40197	The service encountered an error processing your request.
40501	The server is busy.
40613	The database is currently unavailable.

The CAT retry library contains a class named SqlAzureTransientErrorDetectionStrategy that you can use, either as-is, or as a starting point for your own implementation of the ITransientErrorDetectionStrategy interface.

For the complete list of SQL Database errors, see Error Messages (Windows Azure SQL Database).

Share via