SaveChangesWithRetries and Batch Option

The issue is resolved in the Windows Azure SDK 1.3 release which can be downloaded here .

We recently found that there is a bug in our SaveChangesWithRetries method that takes in the SaveChangesOptions in our Storage Client Library.

 public DataServiceResponse 
   SaveChangesWithRetries(SaveChangesOptions options) 
  

The problem is that SaveChangesWithRetries does not propagate the SaveChangesOptions to OData client’s SaveChanges method which leads to each entity being sent over a separate request in a serial fashion. This clearly can cause problems to clients since it does not give transaction semantics. This will be fixed in the next release of our SDK but until then we wanted to provide a workaround.

The goal of this post is to provide a workaround for the above mentioned bug so as to allow “Entity Group Transactions” (EGT) to be used with retries. If your application has always handled retries in a custom manner, you can always use the OData client library directly to issue the EGT operation, and this works as expected with no known issues as shown here:

 context.SaveChanges(SaveChangesOptions.Batch); 
 

We will now describe a BatchWithRetries method that uses the OData API “SaveChanges” and provides the required Batch option to it. As you can see below, the majority of the code is to handle the exceptions and retry logic. But before we delve into more details of BatchWithRetries, let us quickly see how it can be used.

  
 StorageCredentialsAccountAndKey credentials = 
       new StorageCredentialsAccountAndKey(accountName, key);
CloudStorageAccount account = 
       new CloudStorageAccount(credentials, false);
CloudTableClient tableClient = new CloudTableClient(
       account.TableEndpoint.AbsoluteUri, account.Credentials);

// create the table if it does not exist
tableClient.CreateTableIfNotExist(tableName);

// Get the context and add entities (use UpdateObject/DeleteObject if 
// you want to update/delete entities respectively)
TableServiceContext context = tableClient.GetDataServiceContext();
for (int i = 0; i < 5; i++)
{
    // Add entity with same partition key
    context.AddObject(tableName, new Entity()
    {
        PartitionKey = partitionKey,
        RowKey = Guid.NewGuid().ToString()
    });
}

try
{
    // Use the routine below with our own custom retry policy 
    // that retries only on certain exceptions
    context.BatchWithRetries(TableExtensions.RetryExponential());
}
catch (Exception e)
{
    // Handle exception here as required by your application scenario
    // since we have already retried the batch on intermittent errors
    // This exception can be interpreted as 
    // a> Retry count exceeded OR 
    // b> request failed because of application error like Conflict, Pre-Condition Failed etc. 
} 
 

BatchWithRetries takes a RetryPolicy which dictates whether we should retry on an error and how long we should wait between retries. We have also provided a RetryPolicy here which reuses the exponential backoff strategy that the StorageClient uses to wait between each retry. In addition, it retries only on certain exceptions. We handle the following exceptions:

  • DataServiceRequestException – This exception can return a list of operation responses. We will use the status code of the operation response and if it does not exist, then we will use the batch status code.
  • DataServiceClientException – We will use the status code in the exception
  • WebException – We will just use the status code if it exists. If it does not, it implies that the request never went out and we will assume bad gateway.

The code to decide if we will retry is simple – for any 2xx, 3xx, 4xx, 501 and 505 errors, we will not retry. The 2xx is a special case for Batch. Generally, 2xx means success and an exception is not thrown – however, Batch requests returns “Accepted” for certain failures (example: bad request). Because of this, we include 2xx in the list on which we will not retry.

Here is the complete code that can be used for EGT with retries and please note that this will be fixed in our next SDK release. As always, please provide feedback using the email link on the right.

Jai Haridas

 

 // NOTE: You will need to add System.Data.Services.Client and 
// Microsoft.WindowsAzure.StorageClient to your project references
using System;
using System.Collections.Generic;
using System.Data.Services.Client;
using System.Linq;
using System.Net;
using Microsoft.WindowsAzure.StorageClient;


public static class TableExtensions
{
    /// <summary>
    /// Extension method invokes SaveChanges using the batch option and handle any retry errors.
    /// Please note that you can get errors that indicate: 
    ///  1> Entity already exists for inserts
    ///  2> Entity does notexist for deletes
    ///  3> Etag mismatch for updates
    /// on retries because the previous attempt that failed may have actually succeeded  
    /// on the server and the second attempt may fail with a 4xx error. 
    /// </summary>
    /// <param name="context"></param>
    /// <param name="retryPolicy">The retry policy to use</param>
    public static void BatchWithRetries(this TableServiceContext context, 
           RetryPolicy retryPolicy)
    {
        if (context == null)
        {
            throw new ArgumentNullException("context");
        }

        if (retryPolicy == null)
        {
            throw new ArgumentNullException("retryPolicy");
        }

        ShouldRetry shouldRetry = retryPolicy();

        // we will wait at most 40 seconds since Azure request can take at most 30s. 
        // We will reset the timeout before exiting this method 
        int oldTimeout = context.Timeout;
        context.Timeout = 40;

        int currentRetryCount = -1;
        TimeSpan delay;
        try
        {
            for (; ; )
            {
                currentRetryCount++;

                try
                {
                    // Directly use the OData’s SaveChanges api
                    context.SaveChanges(SaveChangesOptions.Batch);
                    break;
                }
                catch (InvalidOperationException e)
                {
                    // TODO: Log the exception here for debugging

                    // Check if we need to retry using the required policy
                    if (!shouldRetry(currentRetryCount, e, out delay))
                    {
                        throw;
                    }

                    System.Threading.Thread.Sleep((int)delay.TotalMilliseconds);
                }

            }
        }
        finally
        {
            context.Timeout = oldTimeout;
        }
    }

    /// <summary>
    /// This is the ShouldRetry delegate that StorageClient uses. This can be easily wrapped in a 
    /// RetryPolicy as shown in RetryExponential and used in places where a retry policy is required
    /// </summary>
    /// <param name="currentRetryCount"></param>
    /// <param name="lastException"></param>
    /// <param name="retryInterval"></param>
    /// <returns></returns>
    public static bool ShouldRetryOnException(int currentRetryCount, Exception lastException, 
                                              out TimeSpan retryInterval)
    {
        int statusCode = TableExtensions.GetStatusCodeFromException(lastException);

        // Let us not retry if 2xx, 3xx, 4xx, 501 and 505 errors OR if we exceed our retry count
        // The 202 error code is one such possibility for batch requests.
        if (currentRetryCount == RetryPolicies.DefaultClientRetryCount
            || statusCode == -1
            || (statusCode >= 200 && statusCode < 500)
            || statusCode == (int)HttpStatusCode.NotImplemented
            || statusCode == (int)HttpStatusCode.HttpVersionNotSupported)
        {
            retryInterval = TimeSpan.Zero;
            return false;
        }

        // The following is an exponential backoff strategy
        Random r = new Random();
        int increment = (int)(
            (Math.Pow(2, currentRetryCount) - 1) * 
             r.Next((int)(RetryPolicies.DefaultClientBackoff.TotalMilliseconds * 0.8),
                    (int)(RetryPolicies.DefaultClientBackoff.TotalMilliseconds * 1.2)));

        int timeToSleepMsec = (int)Math.Min(
            RetryPolicies.DefaultMinBackoff.TotalMilliseconds + increment,
            RetryPolicies.DefaultMaxBackoff.TotalMilliseconds);

        retryInterval = TimeSpan.FromMilliseconds(timeToSleepMsec);
        return true;
    }

    /// <summary>
    /// This retry policy follows an exponential backoff and in addition retries 
    /// only on required HTTP status codes 
    /// </summary>
    public static RetryPolicy RetryExponential()
    {
        return () =>
        {
            return ShouldRetryOnException;
        };
    }

    /// <summary>
    /// Get the status code from exception
    /// </summary>
    /// <param name="e"></param>
    /// <returns></returns>
    private static int GetStatusCodeFromException(Exception e)
    {
        DataServiceRequestException dsre = e as DataServiceRequestException;
        if (dsre != null)
        {
            // Retrieve the status code:
            //  - if we have an operation response, then it is the status code of that response. 
            //     We can only have one response on failure and we can ignore the batch status
            //  - otherwise it is the batch status code 
            OperationResponse opResponse = dsre.Response.FirstOrDefault();
            if (opResponse != null)
            {
                return opResponse.StatusCode;
            }

            return dsre.Response.BatchStatusCode;
        }

        DataServiceClientException dsce = e as DataServiceClientException;
        if (dsce != null)
        {
            return dsce.StatusCode;
        }

        WebException we = e as WebException;
        if (we != null)
        {
            HttpWebResponse response = we.Response as HttpWebResponse;

            // if we do not get a response, we will assume bad gateway. 
            // This is not completely true, but since it is better to retry on such errors, 
            // we make up an error code here
            return response != null ? (int)response.StatusCode : (int)HttpStatusCode.BadGateway;
        }

        // let us not retry on any other exceptions
        return -1;
    }
} 
 

Comments

  • Anonymous
    May 06, 2010
    Will this bug be fixed in Azure StorageClient SDK 1.2?

  • Anonymous
    May 06, 2010
    Yes, it will be fixed in SDK 1.2.