Protecting Your Blobs Against Application Errors

A question we get asked is the following - Do applications need to backup data stored in Windows Azure Storage if Windows Azure Storage already stores multiple replicas of the data? For business continuity, it can be important to protect the data against errors in the application, which may erroneously modify the data.

The replication of data in Windows Azure Storage will not protect against application errors since these are problems at the application layer which will get committed on the replicas that Windows Azure Storage maintains. Currently, many application developers implement their own backup strategy. The purpose of this post is to briefly cover a few strategies that one can use to backup data. We will classify backup strategies based on blob service here, and table service in a later post.

Backing up Blob Data

To create backups in blob service, one first creates a snapshot of the blob. This creates a read only version of the blob. The snapshot blob can be read, copied or deleted but never modified. The great news here is that for a snapshot, you will be charged only for the unique blocks or pages that differ from the base blob (i.e. the blob to which the snapshot belongs to). What this implies is that if the base blob is never modified, you will be charged only for a single copy of the blocks/pages.

The following code shows how to create a snapshot of a blob:

 CloudBlobClient blobClient = new CloudBlobClient(baseUri, credentials);
CloudBlobContainer cloudContainer = blobClient.GetContainerReference(containerName);
CloudBlob cloudBlob = cloudContainer.GetBlobReference("docs/mix2010.ppt");
CloudBlob backupSnapshot = cloudBlob.CreateSnapshot(); 
 
 

 

When you create a snapshot, the snapshot blob gets assigned a unique timestamp This timestamp is returned in the x-ms-snapshot and provides the blob snapshot name. The snapshot name has the same name as the original blob, just extended with the snapshot datetime. For example: The following Uri can be used to address the snapshot:

https://account.blob.core.windows.net/container/docs/mix2010.ppt?snapshot=2010-03-07T00%3A12%3A14.1496934Z

When creating a snapshot you can store new metadata with the snapshot blob at the time the snapshot is created, but after the snapshot is created you cannot modify the metadata nor the blob. If no metadata is provided when performing the snapshot, then the metadata from the base blob is copied over to the snapshot blob.

If we want to take a snapshot and make it the current read/writeable version of the blob we can do that by copying the snapshot to be the current version of the blob using “CopyBlob”.

 CloudBlobClient blobClient = new CloudBlobClient(baseUri, credentials);
CloudBlobContainer cloudContainer = blobClient.GetContainerReference(containerName);
CloudBlob baseBlob = cloudContainer.GetBlobReference(baseBlobName);
CloudBlob backupSnapshot = cloudContainer.GetBlobReference(snapshotBlobName);
baseBlob.CopyFrom(backupSnapshot); 
 
 

To delete a blob with snapshots, you need to also delete all the snapshots. A single delete request can achieve deleting the blob and snapshots by using the x-ms-delete-snapshots header which can be set to

  • include - delete the base blob and snapshots
  • only - delete only the snapshots

If the blob has snapshots, and you try to delete the blob without using the “x-ms-delete-snapshots: include” option, then the delete will fail. The Storage Client library by default will not send the include header which results in the error being returned. If you want to delete a blob with it snapshots, specify the DeleteSnapshotsOption in BlobRequestOptions as follows:

 blob.Delete(new BlobRequestOptions()
{
DeleteSnapshotsOption = DeleteSnapshotsOption.IncludeSnapshots
}); 
 

Since snapshots cannot be retained when the container or base blob is deleted, a backup strategy that uses snapshots has to be careful that it does not incorrectly use delete container or delete blob that specifies the x-ms-delete-snapshots header.

Blob Backup Strategy

Let us now go over one of the strategies that can be used to create backups. Please note that this is just a simple strategy and you may want to tweak it as per your needs.

We will assume that every container that needs to be backed up will be marked with a metadata “shouldbackup”. The backup job then iterates through each container and for each container that has the metadata set, it will snapshot all base blobs that have changed since the last snapshot. We will also ensure that at most N snapshots are maintained and that any snapshot which is more than X days old are deleted if it is not the only backup snapshot (where N and X are configurable when running the backup). To differentiate between snapshots that the backup process creates from what the user may create, we will add a metadata field “isbackedup” to the snapshot when it is created. This seems like a simple enough strategy that ensures that we have a snapshot available for recovery.

We will break down the above strategy into a few crucial modules. Some are extensions to what the current Storage Client provides:

  1. List all containers - this utilizes the listing capability that the Storage client provides. One can improve speed by processing containers in parallel, which we will leave as an exercise.
  2. Create a snapshot for backup – this provides all the metadata that the snapshot should have. The CreateSnapshot API in the StorageClient does not send metadata information and therefore all metadata from base blob is copied over to the snapshot but we want to add the new metadata “isbackedup” to the snapshot.
  3. Group snapshots with base blob – provide a structure that maintains the relationship between the base blob and its snapshots so that we can enforce the rules such as retain atmost N backups etc. This involves listing all blobs and group each base blob with its snapshots. The current library allows you to list snapshot and base blobs but does not provide a relationship between the base blob and the snapshot. We need to list them and build the relation as we iterate.
  4. Garbage collect backup snapshots - delete the oldest snapshot when snapshot count > N or if the snapshot is older than X days if it is not the only snapshot.

Code for container listing :

 /// <summary>
/// Lists all containers and back all the base blobs in each container
/// </summary>
/// <param name="blobClient"></param>
public static void BackupAllBlobs(CloudBlobClient blobClient)
{
    if (blobClient == null)
    {
        throw new ArgumentNullException("blobClient");
    }    
    try
    {
        // list containers and backup each container
      ResultContinuation continuation = null;   
        do
        {
            ResultSegment<CloudBlobContainer> containers = blobClient.ListContainersSegmented(
                null, ContainerListingDetails.Metadata, 0, continuation);
            continuation = containers.ContinuationToken;

            foreach (CloudBlobContainer container in containers.Results)
            {
                if (!string.IsNullOrEmpty(container.Attributes.Metadata["shouldbackup"]))
                {
                      BackupBlobsInContainer(container);
                }
            }
        } while (continuation != null);
    }
    catch (Exception e)
    {
        // log exception details for debugging
        throw;
    }
} 
 

The following code lists all the blobs in the container and builds the BlobSnapshotInfo structure which maintains the relationship between the base blob and all its snapshots. When we list the blobs in the container, the snapshots precede the base blob and the snapshots are listed in the ascending order of the snapshot timestamp. We use this listing characteristic to build the BlobSnapshotInfo structure.

 /// <summary>
/// Lists all blobs in a container and create a snapshot if required. It will build a BlobSnapshotInfo structure
/// to maintain the relationship between base blob and snapshot. 
/// </summary>
/// <param name="state"></param>
private static void BackupBlobsInContainer(CloudBlobContainer container)
{
    // list all blobs and snapshots here
    BlobRequestOptions options = new BlobRequestOptions()
        {
            BlobListingDetails = BlobListingDetails.Snapshots | BlobListingDetails.Metadata,
            UseFlatBlobListing = true
        };

    ResultContinuation continuation = null;

   // We will maintain a BlobSnapshotInfo structure here and make use of the fact that snapshots and blobs 
   // share the same name and hence they will be grouped together in the listing. Snapshots precede base blob in the
   // listing. We therefore create a dummy base blob and then set it when we get the real base blob.
   // It also maintains information about those snapshots that have the isbackup metadata set.
    BlobSnapshotInfo currentBackupInfo = null;
    do
    {
        ResultSegment<IListBlobItem> blobListSegment = container.ListBlobsSegmented(0, continuation, options);
        continuation = blobListSegment.ContinuationToken;

        foreach (CloudBlob blob in blobListSegment.Results)
        {
            // if this blob does not belong to current backup info, it implies
            // that the base blob may be deleted. 
            if (currentBackupInfo != null && !Uri.Equals(currentBackupInfo.BaseBlob.Uri, blob.Uri))
            {
                throw new InvalidOperationException(string.Format(
                    CultureInfo.InvariantCulture,  "The current blob '{0}' does not match current blob info '{1}'",
                    blob.Uri.AbsolutePath,  currentBackupInfo.BaseBlob.Uri.AbsolutePath));
            }
            if (currentBackupInfo == null)
            {
                // we do not have a backup info yet. Let us create one by using a dummy base blob
                // if this is not a base blob
                CloudBlob baseBlob = blob;
                if (blob.SnapshotTime.HasValue)
                {
                    Uri uri = new Uri(blob.ServiceClient.BaseUri, blob.Uri.AbsolutePath);
                    baseBlob = new CloudBlob(uri.AbsoluteUri);
                }

                currentBackupInfo = new BlobSnapshotInfo(baseBlob);
            }
            if (blob.SnapshotTime.HasValue)
            {
                string isBackup = blob.Attributes.Metadata[BlobSnapshotInfo.IsBackupMetadata];
                if (string.IsNullOrEmpty(isBackup))
                {
                    // this is one of users snapshot
                    continue;
                }

                currentBackupInfo.AddSnapshot(blob);
            }
            else
            {
                // the blob is always listed after all snapshots - so we can now process it to 
                // see if snapshot is required and then set it back to null
                currentBackupInfo.BaseBlob = blob;
                currentBackupInfo.SnapshotIfRequired();
                currentBackupInfo = null;
            }
        }
    } while (continuation != null);
} 
 

We shall now describe the BlobSnapshotInfo structure in detail. This class is used to maintain the relationship between base blob and its snapshots. It maintains ‘N’ snapshots and when the next snapshot is added, it will delete the oldest snapshot. It also deletes any snapshots older than X days unless it is the only snapshot available. It uses a blob extension method to create the snapshot with required metadata. CreateSnapshot implementation is described a little later.

 /// <summary>
/// A class to maintain relationships between base blobs and their snapshots. 
/// It also does the bookkeeping such that we do not have more than N snapshots (i.e. MaxSnapshots) and
/// we do not keep snapshots older than X days (i.e. MaxDaysToRetainDeletedBlobSnapshots)  
/// unless it is the last snapshot
/// </summary>
internal class BlobSnapshotInfo
{
    /// <summary>
    /// metadata maintained on the snapshot to indicate that it was created by the backup
    /// </summary>
    internal const string IsBackupMetadata = "isbackedup";
    internal const int MaxSnapshots = 3;
    internal readonly static TimeSpan MaxDaysToRetainDeletedBlobSnapshots = TimeSpan.FromDays(2);

    /// <summary>
    /// We will maintain at most <paramref name="BlobBackupInfo.MaxSnapshots"/> snapshots. 
    /// This can be configured as required by apps
    /// </summary>
    private SortedList<DateTime, CloudBlob> snapshotList = 
            new SortedList<DateTime, CloudBlob>(BlobSnapshotInfo.MaxSnapshots);
    private CloudBlob baseBlob;

    /// <summary>
    /// Constructor 
    /// </summary>
    /// <param name="baseBlob"></param>
    internal BlobSnapshotInfo(CloudBlob baseBlob)
    {
        if (baseBlob == null || baseBlob.SnapshotTime != null)
        {
            throw new ArgumentNullException("baseBlob");
        }
        this.baseBlob = baseBlob;
    }
    /// <summary>
    /// Returns all snapshots in sorted order in ascending order of the snapshot time
    /// </summary>
    internal IList<CloudBlob> SnapshotList
    {
        get
        {
            return this.snapshotList.Values;
        }
    }
    /// <summary>
    /// 
    /// </summary>
    /// <param name="baseBlob"></param>
    internal CloudBlob BaseBlob
    {
        get
        {
            return this.baseBlob;
        }
        set
        {
            if (value == null)
            {
                throw new ArgumentNullException("baseBlob");
            }
            if (value.SnapshotTime.HasValue)
            {
                throw new InvalidOperationException(string.Format(
                    CultureInfo.InvariantCulture,  "Blob '{0}' is a snapshot and not a base blob", value.Uri.AbsoluteUri));
            }
            if (!Uri.Equals(value.Uri, this.baseBlob.Uri))
            {
                throw new InvalidOperationException(string.Format(
                    CultureInfo.InvariantCulture, "Blob '{0}' is not the same as current base blob '{1}'",
                    value.Uri.AbsoluteUri, this.baseBlob.Uri.AbsoluteUri));
            }

            this.baseBlob = value;
        }
    }
    /// <summary>
    /// Create a snapshot only if the blob has been modified since last snapshot.
    /// Maintain at most N snapshots and delete a snapshot if it is older than X days.
    /// </summary>
    internal void SnapshotIfRequired()
    {
        if (this.baseBlob == null)
        {
            throw new InvalidOperationException("Base blob should be set for creating snapshot.");
        }
        CloudBlob lastSnapshot = this.snapshotList.LastOrDefault().Value;
        BlobRequestOptions snapshotRequestOption = new BlobRequestOptions();

        // we will retry 10 times and this would wait sufficiently large period
        snapshotRequestOption.RetryPolicy = RetryPolicies.RetryExponential(10, RetryPolicies.DefaultClientBackoff);

        if (lastSnapshot == null ||
            this.baseBlob.Properties.LastModifiedUtc > lastSnapshot.SnapshotTime.Value)
        {
            try
            {
                // fetch the metadata on base blob which we will copy over to snapshot
                this.baseBlob.FetchAttributes();

                // We will add our backup specific metadata to the ones already existing on the blob
                NameValueCollection metadata = new NameValueCollection(this.baseBlob.Attributes.Metadata);
                metadata.Add(BlobSnapshotInfo.IsBackupMetadata, "true");

                // Our extension method that allows creating snapshots using the metadata
                CloudBlob snapShot = this.baseBlob.CreateSnapshot(metadata, snapshotRequestOption);

                // Add the snapshot that removes the odlest snapshot if required
                this.AddSnapshotImpl(snapShot);
            }
            catch (Exception e)
            {
                // log this error for debugging            
                throw;
            }
        }
        // We will delete a snapshot if it exceeds N days unless it is the only backup we have. 
        while (snapshotList.Count > 1)    
        {
            CloudBlob snapshot = this.snapshotList.First().Value;
            TimeSpan diff = DateTime.UtcNow.Subtract(snapshot.SnapshotTime.Value);
            if (diff < BlobSnapshotInfo.MaxDaysToRetainDeletedBlobSnapshots)
            {
                 break;
            }          
            DeleteOldestSnapshot();
        }
    }
    /// <summary>
    /// Add snapshot to the list of snapshots and delete if the max count has been exceeded
    /// </summary>
    /// <param name="snapshot"></param>
    internal void AddSnapshot(CloudBlob snapshot)
    {
        if (snapshot == null || !snapshot.SnapshotTime.HasValue)
        {
            throw new ArgumentNullException("snapshot");
        }
        if (!IsSnapshotOfBaseBlob(this.baseBlob, snapshot))
        {
            throw new InvalidOperationException(string.Format(
                CultureInfo.InvariantCulture,
                "The snapshot '{0}'does not belong to base blob '{1}'.", snapshot.Uri, this.BaseBlob.Uri));
        }

        AddSnapshotImpl(snapshot);
    }

    private void AddSnapshotImpl(CloudBlob snapshot)
    {
        // add the snapshot
        this.snapshotList.Add(snapshot.SnapshotTime.Value, snapshot);

        // if we have more than required snapshots, remove the first snapshot. We assume that 
        // only a single thread works on a given base blob at a time, so we do not acquire locks.
        if (this.snapshotList.Count > BlobSnapshotInfo.MaxSnapshots)
        {
            DeleteOldestSnapshot();
        }
    }
    /// <summary>
    /// Delete the oldest snapshot. The oldest snapshot should be the first 
    /// in the list as it is sorted by datetime.
    /// </summary>
    /// <param name="snapshot"></param>
    private void DeleteOldestSnapshot()
    {
        CloudBlob snapshot = snapshotList.First().Value;       
        try
        {
            // we use DeleteIfExists so that a successful retry will not throw an exception. 
            // Also, we will ignore any exception since it will be cleaned up next time
            snapshot.DeleteIfExists();
        }
        catch (Exception e)
        {
            // log this error and do nothing since the next backup process will try deleting this again
        }
        // remove the snapshot always even if we failed to remove it from the store
        snapshotList.RemoveAt(0);
    }
    private static bool IsSnapshotOfBaseBlob(CloudBlob baseBlob, CloudBlob snapshot)
    {
        // the uri segment should match if the snapshot belongs to base blob
        return Uri.Equals(baseBlob.Uri, snapshot.Uri);
    }
} 
 

The Snapshot creation process here is improved to take a collection of metadata to set on the snapshot blob. Since a snapshot blob is read only, this can only be done at the time of creating the snapshot. Since metadata is provided during creation, the service will not copy any existing metadata, it is the responsibility of the caller to retrieve the metadata from base blob and then add any more metadata that is needed on the snapshot.

  /// <summary>
 /// Create a snapshot and provide all the metadata during creation 
 /// </summary>
 /// <param name="blob"></param>
 /// <param name="metadata"></param>
 /// <param name="options"></param>
 /// <returns></returns>
 public static CloudBlob CreateSnapshot(this CloudBlob blob, NameValueCollection metadata, BlobRequestOptions options)
 {
     if (blob == null)
     {
         throw new ArgumentNullException("blob");
     }

     if (blob.SnapshotTime.HasValue)
     {
         throw new InvalidOperationException("Cannot snapshot a snapshot blob.");
     }

     ShouldRetry shouldRetry = options.RetryPolicy();

     int currentRetryCount = -1;
     TimeSpan delay;
     for (; ; )
     {
         currentRetryCount++;

         try
         {
             TimeSpan timeout = options.Timeout.HasValue ? options.Timeout.Value : TimeSpan.FromSeconds(30);
             return blob.CreateSnapshot(metadata, timeout);
         }
         catch (InvalidOperationException e)
         {
             // TODO: Log the exception here for debugging

             // Check if we need to retry using the required policy
             if (!IsExceptionRetryable(e) || !shouldRetry(currentRetryCount, e, out delay))
             {
                 throw;
             }

             System.Threading.Thread.Sleep(delay);
         }
     }
 }

 private static CloudBlob CreateSnapshot(this CloudBlob blob, NameValueCollection metadata, TimeSpan timeout)
 {
     HttpWebRequest request = BlobRequest.Snapshot(blob.Uri, (int)timeout.TotalSeconds);
     request.Headers["If-Match"] = blob.Attributes.Properties.ETag;
     request.Timeout = (int)timeout.TotalMilliseconds + 2000;
     foreach (string key in metadata.Keys)
     {
        // Add all the metadata and prefix it with x-ms-meta
         request.Headers.Add("x-ms-meta-" + key, metadata[key]);
     }
     
     blob.ServiceClient.Credentials.SignRequest(request);
     
     using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
     {
         // Fetch attributes after the response is received
         string time = BlobResponse.GetSnapshotTime(response);
         string snapshotAddress =  string.Format("{0}?snapshot={1}", blob.Uri.AbsoluteUri, time);
         CloudBlob snapshotBlob = blob.Container.GetBlobReference(snapshotAddress);
         snapshotBlob.FetchAttributes();
         return snapshotBlob;
     }
}

 /// <summary>
 /// Retry on all exceptions except 2xx, 3xx, 4xx
 /// The 2xx is there for Table batch operations in which teh status code can be 202 even when 
 /// the call failed
 /// </summary>
 /// <param name="currentRetryCount"></param>
 /// <param name="lastException"></param>
 /// <param name="retryInterval"></param>
 /// <returns></returns>
 public static bool IsExceptionRetryable(Exception lastException)
 {
     int statusCode = GetStatusCodeFromException(lastException);

     // Let us not retry if 2xx, 3xx, 4xx, 501 and 505 errors 
     if (statusCode == -1
         || (statusCode >= 200 && statusCode < 500)
         || statusCode == (int)HttpStatusCode.NotImplemented
         || statusCode == (int)HttpStatusCode.HttpVersionNotSupported)
     {
         return false;
     }

     return true;
 }

 /// <summary>
 /// Get the status code from exception. This can be sued for Table, Queue and Blob service
 /// </summary>
 /// <param name="e"></param>
 /// <returns></returns>
 private static int GetStatusCodeFromException(Exception e)
 {
    // Handle DataService request and client exceptions for Table service
     DataServiceRequestException dsre = e as DataServiceRequestException;
     if (dsre != null)
     {
         // Retrieve the status code:
         //  - if we have an operation response, then it is the status code of that operation response. 
         //     We can only have one response on failure and we can ignore the batch status
         //  - otherwise it is the batch status code 
         OperationResponse opResponse = dsre.Response.FirstOrDefault();
         if (opResponse != null)
         {
             return opResponse.StatusCode;
         }

         return dsre.Response.BatchStatusCode;
     }

     DataServiceClientException dsce = e as DataServiceClientException;
     if (dsce != null)
     {
         return dsce.StatusCode;
     }

     WebException we = e as WebException;
     if (we != null)
     {
         HttpWebResponse response = we.Response as HttpWebResponse;

         // if we do not get a response, we will assume bad gateway. This is not completely true, but since it 
         // is better to retry on such errors, we make up an error code here
         return response != null ? (int)response.StatusCode : (int)HttpStatusCode.BadGateway;
     }

     // let us not retry on any other exceptions
     return -1;
 } 
 

 
Blob Restore

Now that we have the backup process in place, it is only fair to show how one can utilize these backups for recovery. For brevity, we will skip the container and blob iteration and just show a code example for recovering a blob by restoring it to the latest available snapshot. If a snapshot exists we just use the latest one. One can extend the below code easily to have a date time check. So for example: revert to a latest snapshot that was taken 2010-04-22 or earlier.

 void RevertToLatestSapshot(CloudBlob blob)
{
      IEnumerable<BlobEntry> snapshots = blob.GetSnapshotsForBlob (
RetryPolicies.RetryExponential(5, RetryPolicies.DefaultClientBackoff));
      BlobEntry latestSnapshot = snapshots.LastOrDefault();
      if (latestSnapshot == null)
      {
           throw new InvalidOperationException("There are no snapshots for the blob.");
      }
 
       // we will backup to latest snapshot irrespective of whether it was created by the backup process
      CloudBlob snapshotBlob = blob.Container.GetBlobReference(latestSnapshot.Attributes.Uri.AbsoluteUri);
      blob.CopyFromBlob(snapshotBlob);
 }

 public static IEnumerable<BlobEntry> GetSnapshotsForBlob(this CloudBlob blob)
 {
        SortedList<DateTime, BlobEntry> snapshots = new SortedList<DateTime, BlobEntry>();

        // we need the prefix of the blob excluding the container name. We will use this prefix for listing
        // blobs
        Uri parentUri = new Uri(blob.Container.Uri.AbsoluteUri + "/");
        string prefix = parentUri.MakeRelativeUri(blob.Uri).ToString();
        
        BlobListingContext context = new BlobListingContext(prefix, null, null, BlobListingDetails.Snapshots);

        HttpWebRequest request = BlobRequest.List(blob.Container.Uri, 30, context);
        request.Timeout = (int)TimeSpan.FromSeconds(40).TotalMilliseconds;
        blob.ServiceClient.Credentials.SignRequest(request);

        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {
            ListBlobsResponse listResponse = BlobResponse.List(response);
            foreach (BlobEntry entry in listResponse.Blobs)
            {
                if (entry.Attributes.Snapshot.HasValue)
                {
                    snapshots.Add(entry.Attributes.Snapshot.Value, entry);
                }
            }
        }

        return snapshots.Values.AsEnumerable();
 } 
 

Some improvements that one can do to the above code are:

1. Process blobs and containers in parallel for faster backups and restore.

2. The above code does not continue after an error. It will be good to store progress information about the blobs/containers, and continue the backup if the role fails and to log any issues that may come up during backup. You may also define an exception and use it to capture more information rather than using InvalidOperation, etc.

3. GetSnapshotsForBlob should handle continuation token and retry on intermittent errors.

Jai Haridas