Condividi tramite


Using Windows Azure Page Blobs and How to Efficiently Upload and Download Page Blobs

This post refers to the Storage Client Library shipped in SDK 1.2. Windows Azure SDK 1.3 provides additional Page Blob functionality via the CloudPageBlob class. The current release can be downloaded here .

We introduced Page Blobs at PDC 2009 as a type of blob for Windows Azure Storage.   With the introduction of Page Blobs, Windows Azure Storage now supports the following two blob types:

  1. Block Blobs (introduced PDC 2008) –  targeted at streaming workloads. 
    • Each blob consists of a sequence/list of blocks. The following are properties of Block Blobs:
      • Each Block has a unique ID, scoped by the Blob Name
      • Blocks can be up to 4MBs in size, and the blocks in a Blob do not have to be the same size
      • A Block blob can consist of up to 50,000 blocks
      • Max block blob size is 200GB
    • Commit-based Update Semantics – Modifying a block blob is a two-phase update process.   It first consists of uploading the blocks to add or modify as uncommitted blocks for a blob.  Then after they are all uploaded, the blocks to add/change/remove are committed via a PutBlockList to create a new readable version of a blob.   Therefore, updating a block blob is a two-phase update process where you upload all changes, and then commit them atomically.
    • Range reads can be from any byte offset in the blob.
  2. Page Blobs (introduced PDC 2009) – targeted at random write workloads. 
    • Each blob consists of an array/index of pages. The following are properties of Page Blobs:
      • Each page is of size 512 bytes, so all writes must be 512 byte aligned, and the blob size must be a multiple of 512 bytes. 
      • Writes have a starting offset and can write up to 4MBs worth of pages at a time.  These are range-based writes that consist of a sequential range of pages.
      • Max page blob size is 1TB
    • Immediate Update Semantics – As soon as a write request for a sequential set of pages succeeds in the blob service, the write has committed, and success is returned back to the client.  The update is immediate, so there is no commit step as there is for block blobs.
    • Range reads can be done from any byte offset in the blob.

Unique Characteristics of Page Blobs

We created Page Blobs out of a need to have a cloud storage data abstraction for files that supports:

  1.  
    1. Fast range-based reads and writes – need a data abstraction with single update writes, to provide a fast update alternative to the two-phase update of Block Blobs.
    2. Index-based data structure – need a data abstraction that supports index-based access, in comparison to the list-based approach of block blobs.
    3. Efficient sparse data structure – since the data object can represent a large sparse index, we wanted to create an efficient way to manage and avoid charging for empty pages.  Don’t charge for parts of the index that do not have any data pages stored in them.

Uses for Page Blobs

The following are some of the scenarios Page Blobs are being used for:

  • Windows Azure Drives - One of the key scenarios for Page Blobs was to support Windows Azure Drives. Windows Azure Drives allows Windows Azure cloud applications to mount a network attached durable drive, which is actually a Page Blob (see prior post).
  • Files with Range-Based Updates – An application can treat a Page Blob as a file, updating just the parts of the file/blob that have changed using ranged writes.   In addition, to deal with concurrency, the application can obtain and renew a Blob Lease to maintain an exclusive write lease on the Page Blob for updating.
  • Logging - Another use of Page Blobs is to use them for custom logging for their applications.  For example, for a given role instance, when the role starts up a Page Blob can be created for some MaxSize, which is the max amount of log space the role wants to use for a day.   The given role instance can then write its logs using up to 4MB range-based writes, where a header provides metadata for the size of the log entry, timestamp, etc.   When the Page Blob is filled up, then treat the Page Blob as a circular buffer and start writing from the beginning of the Page Blob, or create a new page blob, depending upon how the application wants to manage the log files (blobs).   With this type of approach you can have a different Page Blob for each role instance so that there is just a single writer to each page blob for logging.  Then to know where to start writing the logs on role failover the application can just create a new Page Blob if a role restarts, and GC the older Page Blobs after a given number of hours or days.  Since you are not charged for pages that are empty, it doesn’t matter if you don’t fill the page blob up.

Using Storage Client Library to Access Page Blobs

We’ll now walk through how to use the Windows Azure Storage Client library to create, update and read Page Blobs.

  • Creating a Page Blob and Page Blob Size

To create a Page Blob, we first create a CloudBlobClient object, with the base Uri for accessing the blob storage for your storage account along with the StorageCredentialsAccountAndKey object as shown below.   This gives a CloudBlobClient object you can then use to derive all of your requests to the blob service for that storage account.    The example then shows creating a reference to a CloudBlobContainer object, and then creating the container if it doesn’t already exist.  Then from the CloudBlobContainer object we can create a reference to a CloudPageBlob object by specifying the page blob name we want to access.

 using Microsoft.WindowsAzure.StorageClient;
StorageCredentialsAccountAndKey creds = new StorageCredentialsAccountAndKey(accountName, key);
string baseUri = string.Format("https://{0}.blob.core.windows.net", accountName);
CloudBlobClient blobStorage = new CloudBlobClient(baseUri, creds);
CloudBlobContainer container = blobStorage.GetContainerReference(containerName);
container.CreateIfNotExist();
CloudPageBlob pageBlob = container.GetPageBlobReference(blobName);
pageBlob.Create(blobSize); 
 

Then to create the page blob we call CloudPageBlob.Create passing in the max size for the blob we want to create.  Note that the blobSize has to be modulo 512 bytes.

Right after the page blob is created, no pages are actually stored, but you can read from any page range within the blob, and will get back zeros.  This is because “empty pages” are treated by the page blob as if they were filled with zeros when trying to read those pages.  This also means that after creating a blob, you are not charged for any pages even if you specify a 1TB page blob.   You are only charged for pages that have data stored in them.

Make sure when uploading a blob that you don’t upload pages that are full of zeros, and instead skip over those pages leaving them empty.   This will ensure that you aren’t charged for those empty pages.  See the example below for uploading VHDs to page blobs, where we only upload pages that are non-zero.  Similarly, when reading a page blob, if you have a lot of empty pages, you may want to first get the valid page ranges with GetPageRanges, and then download just those pages.  This is used in the downloading VHD example below.

  • Writing Pages

To write pages you use the CloudPageBlob.WritePages method.  This allows you to write a sequential set of pages up to 4MBs, and the offset being written to must start on a 512 byte boundary (startingOffset % 512 == 0), and end on a 512 boundary - 1.  The below shows an example of calling WritePages for a blob object we are accessing:

 CloudPageBlob pageBlob = container.GetPageBlobReference(blobName);
pageBlob.WritePages(dataStream, startingOffset); 
 

In the above example, if the dataStream is larger than 4MBs or it does not end aligned to 512 bytes then an exception is thrown.

A word of caution here is that if you get a “500” or “504” error (e.g., timeout, connection closed, etc) back for a WritePages request, then this means the write may or may not have succeeded on the server.  In this case, it is best to retry the WritePages to make sure the contents are updated.

  • Reading Pages

To read pages you use the CloudPageBlob.OpenRead method with BlobStream.Read reader object to read the pages.  This allows you to stream the full blob or range of pages from any offset in the blob. Ranged reads can start end and at any byte offset (they do not have to be 512 byte aligned like in writing).

 CloudPageBlob pageBlob = container.GetPageBlobReference(blobName);
BlobStream blobStream = pageBlob.OpenRead();
byte[] buffer = new byte[rangeSize];
blobStream.Seek(blobOffset, SeekOrigin.Begin); 
int numBytesRead = blobStream.Read(buffer, bufferOffset, rangeSize); 
 

In the above, we use CloudPageBlob.OpenRead (inherited from CloudBlob) to get a BlobStream for reading the contents of the blob.   When creating a blob stream, the stream is set to be read at the start of the blob. To start reading a different byte offset, call blobStream.Seek with that offset. The read will then download the page blob bytes for the given rangeSize passed in storing it into the buffer at the bufferOffset.   Remember, that if you do a read over pages without any data stored in them, the blob service will return 0s for those pages.

One of the key concepts we talked about earlier is that if you have a sparsely populated blob you may want to just download the valid page regions.  To do this you can use the CloudPageBlob.GetPageRanges to get an enumerable of PageRange objects.   Calling GetPageRanges returns the list of valid page range regions for the page blob.  You can then enumerate these, and download just the pages with data in them.  The below is an example of doing this:

 CloudBlobClient blobStorage = new CloudBlobClient(accountName, creds);
blobStorage.ReadAheadInBytes = 0;
CloudBlobContainer container = blobStorage.GetContainerReference(containerName);
CloudPageBlob pageBlob = container.GetPageBlobReference(blobName);
IEnumerable<PageRange> pageRanges = pageBlob.GetPageRanges();
BlobStream blobStream = pageBlob.OpenRead();
 
foreach (PageRange range in pageRanges)
{
    // EndOffset is inclusive... so need to add 1
    int rangeSize = (int)(range.EndOffset + 1 - range.StartOffset);

    // Seek to the correct starting offset in the page blob stream
    blobStream.Seek(range.StartOffset, SeekOrigin.Begin);

    // Read the next page range into the buffer
    byte[] buffer = new byte[rangeSize];
    blobStream.Read(buffer, 0, rangeSize);

    // Then use the buffer for the page range just read
} 
 

The above example gets the list of valid page ranges, then reads each valid page range into a local buffer to be used by the application how it sees fit.  An important step here is the “blobStream.Seek”, which moves the blob stream to the correct starting position (offset) for the next valid page range.  

One thing to realize when using GetPageRanges is that you get back a list of continuous ranges for what are the current valid regions in the page blob.    You do not get back the regions in the granularity or order that you wrote them.   For example, assume you did the following write pages in the following order: [512-2048), then [4096-5120], then [2048-2560), and then [0-1024).  In calling GetPageRanges, you would get back the two ranges [0-2560) and [4096-5120).

Note, in the above code we sent the CloudBlobClient.ReadAheadInBytes to 0.  If we did not do this, then the code would read ahead from the blob service over the the page ranges without any data in them when doing the blobStream.Read.   Therefore, setting the read ahead to zero, means that we can make sure that we only download the exact page ranges we want to (the pages with data in them).

Advanced Functionality – Clearing Pages and Changing Page Blob Size

There are a few advanced Page Blob features that are not exposed at the StorageClient level, but are accessible at the REST or StorageClient.Protocol level.  We’ll briefly touch on two of them here -- clearing pages and changing the size of the page blob.

If you need to delete or zero out a set of pages in a Page Blob, writing zeros to those pages, will result in data pages (full of zeros) being stored into those pages, and there would be a charge for them.   Therefore, if you have the need to delete or zero out a set of pages in a page blob, it is beneficial to call the Put Page with the header x-ms-page-write: clear in the REST APIs.  This will clear the set of pages from the page blob, resulting in those being removed from the set of pages being charged.  The following is an example ClearPages routine from Jai Haridas to use until we add support for clear pages at the Storage Client level:

 /// Jai Haridas, Microsoft 2010
using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure.StorageClient.Protocol;
using System.Net;
public static void ClearPages(CloudPageBlob pageBlob, int timeoutInSeconds, 
                   long start, long end, string leaseId)
{
    if (start % 512 != 0 || start >= end)
    {
        throw new ArgumentOutOfRangeException("start");
    }
    if ((end + 1) % 512 != 0)
    {
        throw new ArgumentOutOfRangeException("end");
    }
    if (pageBlob == null)
    {
        throw new ArgumentNullException("pageBlob");
    }
    if (timeoutInSeconds <= 0)
    {
        throw new ArgumentOutOfRangeException("timeoutInSeconds");
    }
    UriBuilder uriBuilder = new UriBuilder(pageBlob.Uri);
    uriBuilder.Query = string.Format("comp=page&timeout={0}", timeoutInSeconds);
    Uri requestUri = uriBuilder.Uri;

    // Take care of SAS query parameters if required
    if (pageBlob.ServiceClient.Credentials.NeedsTransformUri)
    {
        requestUri = new Uri(pageBlob.ServiceClient.Credentials.TransformUri(requestUri.ToString()));
    }

    // Create the request and set all the required headers
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUri);
    request.Method = "PUT";
    // let the http web request timeout after 30s of the total timeout we provide to Azure Storage
    request.Timeout = (int)Math.Ceiling(TimeSpan.FromSeconds(timeoutInSeconds + 30).TotalMilliseconds);

    request.ContentLength = 0;
    request.Headers.Add("x-ms-version", "2009-09-19");
    request.Headers.Add("x-ms-page-write", "Clear");
    request.Headers.Add("x-ms-range", string.Format("bytes={0}-{1}", start, end));
    
    if (!string.IsNullOrEmpty(leaseId))
    {
        request.Headers.Add("x-ms-lease-id", leaseId);
    }
    // We have all the headers in place- let us add auth and date header
    pageBlob.ServiceClient.Credentials.SignRequest(request);
    using (HttpWebResponse clearResponse = (HttpWebResponse)request.GetResponse())
    {
        // Add your own logging here as the call is successful if 
        // clearResponse.StatusCode == HttpStatusCode.Created
    }
}

public static void ClearPageWithRetries(CloudPageBlob pageBlob, int timeoutInSeconds, 
                   long start, long end, string leaseId)
{
    int retry = 0;
    int maxRetries = 4; 
    for (; ; )
    {
        retry++;

        try
        {
            ClearPage(pageBlob, timeoutInSeconds, start, end, leaseId);
            break;
        }
        catch (WebException e)
        {
            // Log the webexception status, since that tells what the error may be
            // Let us re-throw the error on 3xx,4xx, 501 and 505 errors OR
            // if we exceed the retry count
            HttpWebResponse response = e.Response as HttpWebResponse;
            if (retry == maxRetries || 
                (response != null && 
                (((int)response.StatusCode >= 300 && 
                  (int)response.StatusCode < 500) || 
                  (response.StatusCode == HttpStatusCode.NotImplemented)||
                  (response.StatusCode == HttpStatusCode.HttpVersionNotSupported))))
            {
                throw;
            }
        }

        // Backoff: 3s, 9s, 27s ... 
        int retryInterval = (int)Math.Pow(3, retry);
        System.Threading.Thread.Sleep(retryInterval*1000);
    }
} 
 

When creating the page blob, the max size specified is primarily used for bounds checking the updates to the blob.  You can actually change the size of the Page Blob at anytime using the REST API (via Set Blob Properties and x-ms-blob-content-length)  or Protocol interfaces (via BlobRequest.SetProperties and newBlobSize).   If you shrink the blob size, then the pages past the new max size at the end of the blob will be deleted.  If you increase the size of the page blob, then empty pages will be effectively added at the end of the Page Blob.

 using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure.StorageClient.Protocol;
using System.Net;

// leaving out account/container creation

 CloudPageBlob pageBlob = container.GetPageBlobReference(config.Blob);
Uri requestUri = pageBlob.Uri;
if (blobStorage.Credentials.NeedsTransformUri)
{
    requestUri = new Uri(blobStorage.Credentials.TransformUri(requestUri.ToString()));
}

HttpWebRequest request = BlobRequest.SetProperties(requestUri, timeout,
               pageBlob.Properties, null, newBlobSize);
request.Timeout = timeout;
blobStorage.Credentials.SignRequest(request);
using (WebResponse response = request.GetResponse())
{
    // call succeeded
}; 
 

Putting it All Together to Efficiently Upload VHDs to Page Blobs

Now we want to tie everything together by providing an example command line application written using the Storage Client Library that allows you to efficiently upload VHDs to Page Blobs.   It actually works for any file (nothing specific in the program to VHDs), as long as you are OK with the end of the file to be 512 byte aligned when stored into the Page Blob.

The application was written by Andy Edwards for the Windows Azure Drive MIX 2010 demo (please see prior post).   The program takes 3 parameters:

  1. The local file you want to upload
  2. The full uri for the page blob you want to store in the blob service
  3. The name of a local file that has the storage account key stored in it

An example command line for running the program looks like:

c:\> vhdupload.exe input-file https://accountname.blob.core.windows.net/container/blobname key.txt

The program reads over the local file, finding the regions of non-zero pages, and then uses WritePages to write them to the page blob.  As describe above, it skips over pages that are empty (filled with zeros), so they are not uploaded.  Also, for the last buffer to be uploaded, if it is not 512 byte aligned, we resize it so that it is aligned to 512 bytes.

Here is the code:

 // Andy Edwards, Microsoft 2010
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure;

public class VhdUpload
{
    public static void Main(string [] args)
    {
        Config config = Config.Parse(args);
        try
        {
            Console.WriteLine("Uploading: " + config.Vhd.FullName + "\n" +
                              "To:        " + config.Url.AbsoluteUri);
            UploadVHDToCloud(config);
        }
        catch (Exception e)
        {
            Console.WriteLine("Error uploading vhd:\n" + e.ToString());
        }
    }
    private static bool IsAllZero(byte[] range, long rangeOffset, long size)
    {
        for (long offset = 0; offset < size; offset++)
        {
            if (range[rangeOffset + offset] != 0)
            {
                return false;
            }
        }
        return true;
    }
    private static void UploadVHDToCloud(Config config)
    {
        StorageCredentialsAccountAndKey creds = new 
               StorageCredentialsAccountAndKey(config.Account, config.Key);

        CloudBlobClient blobStorage = new CloudBlobClient(config.AccountUrl, creds);
        CloudBlobContainer container = blobStorage.GetContainerReference(config.Container);
        container.CreateIfNotExist();

        CloudPageBlob pageBlob = container.GetPageBlobReference(config.Blob);
        Console.WriteLine("Vhd size:  " + Megabytes(config.Vhd.Length));

        long blobSize = RoundUpToPageBlobSize(config.Vhd.Length);
        pageBlob.Create(blobSize);

        FileStream stream = new FileStream(config.Vhd.FullName, FileMode.Open, FileAccess.Read);
        BinaryReader reader = new BinaryReader(stream);

        long totalUploaded = 0;
        long vhdOffset = 0;
        int offsetToTransfer = -1;

        while (vhdOffset < config.Vhd.Length)
        {
            byte[] range = reader.ReadBytes(FourMegabytesAsBytes);

            int offsetInRange = 0;

            // Make sure end is page size aligned
            if ((range.Length % PageBlobPageSize) > 0)
            {
                int grow = (int)(PageBlobPageSize - (range.Length % PageBlobPageSize));
                Array.Resize(ref range, range.Length + grow);
            }

            // Upload groups of contiguous non-zero page blob pages.  
            while (offsetInRange <= range.Length)
            {
                if ((offsetInRange == range.Length) ||
                    IsAllZero(range, offsetInRange, PageBlobPageSize))
                {
                    if (offsetToTransfer != -1)
                    {
                        // Transfer up to this point
                        int sizeToTransfer = offsetInRange - offsetToTransfer;
                        MemoryStream memoryStream = new MemoryStream(range, 
                                     offsetToTransfer, sizeToTransfer, false, false);
                        pageBlob.WritePages(memoryStream, vhdOffset + offsetToTransfer);
                        Console.WriteLine("Range ~" + Megabytes(offsetToTransfer + vhdOffset) 
                                + " + " + PrintSize(sizeToTransfer));
                        totalUploaded += sizeToTransfer;
                        offsetToTransfer = -1;
                    }
                }
                else
                {
                    if (offsetToTransfer == -1)
                    {
                        offsetToTransfer = offsetInRange;
                    }
                }
                offsetInRange += PageBlobPageSize;
            }
            vhdOffset += range.Length;
        }
        Console.WriteLine("Uploaded " + Megabytes(totalUploaded) + " of " + Megabytes(blobSize));
    }

    private static int PageBlobPageSize = 512;
    private static int OneMegabyteAsBytes = 1024 * 1024;
    private static int FourMegabytesAsBytes = 4 * OneMegabyteAsBytes;
    private static string PrintSize(long bytes)
    {
        if (bytes >= 1024*1024) return (bytes / 1024 / 1024).ToString() + " MB";
        if (bytes >= 1024) return (bytes / 1024).ToString() + " kb";
        return (bytes).ToString() + " bytes";
    }
    private static string Megabytes(long bytes)
    {
        return (bytes / OneMegabyteAsBytes).ToString() + " MB";
    }
    private static long RoundUpToPageBlobSize(long size)
    {
        return (size + PageBlobPageSize - 1) & ~(PageBlobPageSize - 1);
    }
}
public class Config
{
    public Uri Url;
    public string Key;
    public FileInfo Vhd;
    public string AccountUrl
    {
        get 
        {
            return Url.GetLeftPart(UriPartial.Authority);
        }
    }
    public string Account
    {
        get
        {
            string accountUrl = AccountUrl;

            accountUrl = accountUrl.Substring(Url.GetLeftPart(UriPartial.Scheme).Length);
            accountUrl = accountUrl.Substring(0, accountUrl.IndexOf('.'));

            return accountUrl;
        }
    }
    public string Container
    {
        get
        {
            string container = Url.PathAndQuery;
            container = container.Substring(1);
            container = container.Substring(0, container.IndexOf('/'));
            return container;
        }
    }
    public string Blob
    {
        get
        {
            string blob = Url.PathAndQuery;
            blob = blob.Substring(1);
            blob = blob.Substring(blob.IndexOf('/') + 1);

            int queryOffset = blob.IndexOf('?');
            if (queryOffset != -1)
            {
                blob = blob.Substring(0, queryOffset);
            }
            return blob;
        }
    }
    public static Config Parse(string [] args)
    {
        if (args.Length != 3)
        {
            WriteConsoleAndExit("Usage: vhdupload <file> <url> <keyfile>");
        }

        Config config = new Config();
        config.Url = new Uri(args[1]);
        config.Vhd = new FileInfo(args[0]);

        if (!config.Vhd.Exists)
        {
            WriteConsoleAndExit(args[0] + " does not exist");
        }

        config.ReadKey(args[2]);

        return config;
    }
    public void ReadKey(string filename)
    {
        try
        {
            Key = File.ReadAllText(filename);
            Key = Key.TrimEnd(null);
            Key = Key.TrimStart(null);
        }
        catch (Exception e)
        {
            WriteConsoleAndExit("Error reading key file:\n" + e.ToString());
        }
    }
    private static void WriteConsoleAndExit(string s)
    {
        Console.WriteLine(s);
        System.Environment.Exit(1);
    }
} 
 

Putting it All Together to Efficiently Download Page Blobs to VHDs

Now we want to finish tying everything together by providing a command line program to also efficiently download page blobs using the Storage Client Library.   The application was also written by Andy Edwards.  The program takes 3 parameters:

  1. The full uri for the page blob you want to download from the blob service
  2. The local file you want to store the blob to
  3. The name of a local file that has the storage account key stored in it

An example command line for running the program looks like:

c:\> vhdupload.exe https://accountname.blob.core.windows.net/container/blobname output-file key.txt

The program gets the valid page ranges with GetPageRanges, and then it reads just those ranges, and writes those ranges at the correct offset (by seeking to it) in the local file. 

Note, that the reads are broken up into 4MB reads, because (a) there is a bug in the storage client library where ReadPages uses a default timeout of 90 seconds, which may not be large enough if you are downloading page ranges in the size of 100s of MBs or larger, and (b) breaking up reads into smaller chunks allows more efficient continuation and retries of the download if there are connectivity issues for the client.

Here is the code:

 // Andy Edwards, Microsoft 2010
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using Microsoft.WindowsAzure.StorageClient;
using Microsoft.WindowsAzure;

public class VhdDownload
{
    public static void Main(string [] args)
    {
        Config config = Config.Parse(args);
        try
        {
            Console.WriteLine("Downloading: " + config.Url.AbsoluteUri + "\n" +
                              "To:          " + config.Vhd.FullName);
            DownloadVHDFromCloud(config);
        }
        catch (Exception e)
        {
            Console.WriteLine("Error downloading vhd:\n" + e.ToString());
        }
    }
    private static void DownloadVHDFromCloud(Config config)
    {
        StorageCredentialsAccountAndKey creds = 
               new StorageCredentialsAccountAndKey(config.Account, config.Key);

        CloudBlobClient blobStorage = new CloudBlobClient(config.AccountUrl, creds);
        blobStorage.ReadAheadInBytes = 0;

        CloudBlobContainer container = blobStorage.GetContainerReference(config.Container);
        CloudPageBlob pageBlob = container.GetPageBlobReference(config.Blob);

        // Get the length of the blob
        pageBlob.FetchAttributes();
        long vhdLength = pageBlob.Properties.Length;
        long totalDownloaded = 0;
        Console.WriteLine("Vhd size:  " + Megabytes(vhdLength));
        
        // Create a new local file to write into
        FileStream fileStream = new FileStream(config.Vhd.FullName, FileMode.Create, FileAccess.Write);
        fileStream.SetLength(vhdLength);

        // Download the valid ranges of the blob, and write them to the file
        IEnumerable<PageRange> pageRanges = pageBlob.GetPageRanges();
        BlobStream blobStream = pageBlob.OpenRead();

        foreach (PageRange range in pageRanges)
        {
            // EndOffset is inclusive... so need to add 1
            int rangeSize = (int)(range.EndOffset + 1 - range.StartOffset);

            // Chop range into 4MB chucks, if needed
            for (int subOffset = 0; subOffset < rangeSize; subOffset += FourMegabyteAsBytes)
            {
                int subRangeSize = Math.Min(rangeSize - subOffset, FourMegabyteAsBytes);
                blobStream.Seek(range.StartOffset + subOffset, SeekOrigin.Begin);
                fileStream.Seek(range.StartOffset + subOffset, SeekOrigin.Begin);

                Console.WriteLine("Range: ~" + Megabytes(range.StartOffset + subOffset) 
                                  + " + " + PrintSize(subRangeSize));
                byte[] buffer = new byte[subRangeSize];

                blobStream.Read(buffer, 0, subRangeSize);
                fileStream.Write(buffer, 0, subRangeSize);
                totalDownloaded += subRangeSize;
            }
        }
        Console.WriteLine("Downloaded " + Megabytes(totalDownloaded) + " of " + Megabytes(vhdLength));
    }
    private static int OneMegabyteAsBytes = 1024 * 1024;
    private static int FourMegabyteAsBytes = 4 * OneMegabyteAsBytes;
    private static string Megabytes(long bytes)
    {
        return (bytes / OneMegabyteAsBytes).ToString() + " MB";
    }

    private static string PrintSize(long bytes)
    {
        if (bytes >= 1024*1024) return (bytes / 1024 / 1024).ToString() + " MB";
        if (bytes >= 1024) return (bytes / 1024).ToString() + " kb";
        return (bytes).ToString() + " bytes";
    }
}
public class Config
{
    public Uri Url;
    public string Key;
    public FileInfo Vhd;
    public string AccountUrl
    {
        get 
        {
            return Url.GetLeftPart(UriPartial.Authority);
        }
    }
    public string Account
    {
        get
        {
            string accountUrl = AccountUrl;

            accountUrl = accountUrl.Substring(Url.GetLeftPart(UriPartial.Scheme).Length);
            accountUrl = accountUrl.Substring(0, accountUrl.IndexOf('.'));

            return accountUrl;
        }
    }
    public string Container
    {
        get
        {
            string container = Url.PathAndQuery;
            container = container.Substring(1);
            container = container.Substring(0, container.IndexOf('/'));
            return container;
        }
    }
    public string Blob
    {
        get
        {
            string blob = Url.PathAndQuery;
            blob = blob.Substring(1);
            blob = blob.Substring(blob.IndexOf('/') + 1);
            int queryOffset = blob.IndexOf('?');
            if (queryOffset != -1)
            {
                blob = blob.Substring(0, queryOffset);
            }
            return blob;
        }
    }
    public static Config Parse(string [] args)
    {
        if (args.Length != 3)
        {
            WriteConsoleAndExit("Usage: vhddownload <url> <file> <keyfile>");
        }
        Config config = new Config();
        config.Url = new Uri(args[0]);
        config.Vhd = new FileInfo(args[1]);
        if (config.Vhd.Exists)
        {
            try
            {
                config.Vhd.Delete();
            }
            catch (Exception e)
            {
                WriteConsoleAndExit("Failed to delete vhd file:\n" + e.ToString());
            }
        }
        config.ReadKey(args[2]);
        return config;
    }
    public void ReadKey(string filename)
    {
        try
        {
            Key = File.ReadAllText(filename);
            Key = Key.TrimEnd(null);
            Key = Key.TrimStart(null);
        }
        catch (Exception e)
        {
            WriteConsoleAndExit("Error reading key file:\n" + e.ToString());
        }
    }
    private static void WriteConsoleAndExit(string s)
    {
        Console.WriteLine(s);
        System.Environment.Exit(1);
    }

} 
 

Summary

The following are a few areas worth summarizing about Page Blobs:

  • When creating a Page Blob you specify the max size, but are only charged for pages with data stored in them.
  • When uploading a Page Blob, do not store empty pages.
  • When updating pages with zeros, clear them with ClearPages
  • Reading from empty pages will return zeros
  • When downloading a Page Blob, first use GetPageRanges, and only download the page ranges with data in them

Brad Calder

Comments