Character Encoding Issues Related to Copy Blob API

This blog applies to the 2011-08-18 storage version or earlier of the Copy Blob API and the Windows Azure Storage Client Library version 1.6.

Two separate problems are discussed in this blog:

  1. Over REST, the service expects the ‘+’ character appearing as part of the x-ms-copy-source header to be percent encoded. When the ‘+’ is not URL encoded, the service would interpret it as space ‘ ’ character.
  2. The Windows Azure Storage Client Library is not URL percent encoding the x-ms-copy-source header value. This leads to a misinterpretation of x-ms-copy-source blob names that include the percent ‘%’ character.

When using Copy Blob, character ‘+’ appearing as part of the x-ms-copy-source header must be URL percent encoded

When using the Copy Blob API, the x-ms-copy-source header value must be URL percent encoded. However, when the server is decoding the string, it is converting character ‘+’ to a space which might not be compatible with the encoding rule applied by the client and in particular, the Windows Azure Storage Client Library.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “audio+video.mp4”

Using the Windows Azure Storage Client Library, the following value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/audio+video.mp4

When the data is received by the server, the blob name would then be interpreted as “audio video.mp4” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/audio%2bvideo.mp4

In that case, the server when decoding this header would interpret the blob name correctly as “audio+video.mp4”

NOTE: The described server behavior in this blog does not apply to the request URL but only applies to the x-ms-copy-source header that is used as part of the Copy Blob API with version 2011-08-18 or earlier.

To get correct Copy Blob behavior, please consider applying the following encoding rules for the x-ms-copy-source header:

  1. URL percent encode character ‘+’ to “%2b”.
  2. URL percent encode space i.e. character ‘ ‘ to “%20”. Note that if you currently happen to encode character space to character ‘+’, the current server behavior will interpret it as a space when decoding. However, this behavior is not compatible with the rule to decode request URLs where character ‘+’ is kept as a ‘+’ after decoding.
  3. In case you are using the Windows Azure Storage Client Library, please apply the workaround at the end of this post.

Windows Azure Storage Client Library is not URL encoding the x-ms-copy-source header

As described in the previous section, x-ms-copy-source header must be URL percent encoded. However the Windows Azure Storage Client Library is transmitting the blob name in an un-encoded manner. Therefore any blob name that has percent ‘%’ in its name followed by a hex number will be misinterpreted on the server side.

Example: Assume that an application wants to copy from a source blob with the following key information: AccountName = “foo” ContainerName = “container” BlobName = “data%25.txt”

Using the Windows Azure Storage Client Library, the following un-encoded value for the x-ms-copy-source header is generated and transmitted over the wire:

x-ms-copy-source: /foo/container/data%25.txt

Data received by the server will be URL decoded and therefore the blob name would be interpreted as “data%.txt” which is not what the user intended. A compatible header would be:

x-ms-copy-source: /foo/container/data%2525.txt

In that case, the server when decoding this header would interpret the blob name correctly as “data%25.txt”

Note that this bug exists in Version 1.6 of the client library and will be fixed in future releases.

As described in the previous sections, the current behavior of Copy Blob APIs exposed by  the client library will not work properly in case the characters ‘+’ or ‘%’ appear as part of the source blob name.  The affected APIs are CloudBlob.CopyFromBlob and CloudBlob.BeginCopyFromBlob.

To get around this issue, we have provided the following extension method which creates a safe CloudBlob object that can be used as the sourceBlob with any of the copy blob APIs. Please note that the returned object should not be used to access the blob or to perform any action on it.

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

Windows Azure Storage Client Library Code Workaround

As described in the previous sections, the current behavior of Copy Blob APIs exposed by the client library will not work properly in case the characters ‘+’ or ‘%’ appear as part of the source blob name. The affected APIs are CloudBlob.CopyFromBlob and CloudBlob.BeginCopyFromBlob.

To get around this issue, we have provided the following extension method which creates a safe CloudBlob object that can be used as the sourceBlob with any of the copy blob APIs. Please note that the returned object should not be used to access the blob or to perform any action on it.

Note: This workaround is needed for Windows Azure Storage Library version 1.6.

 public static class CloudBlobCopyExtensions
{
    /// <summary>
    /// This method converts a CloudBlob to a version that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only.
    /// The returned object must not be used to access the blob, neither should any of its API be invoked.
    /// This method should only be used against storage version 2011-08-18 or earlier
    /// and with Windows Azure Storage Client Versions 1.6     /// </summary>
    /// <param name="originBlob">The source blob this being copied</param>
    /// <returns>CloudBlob that can be safely used as a source for the CopyFromBlob or BeginCopyFromBlob APIs only.</returns>
    public static CloudBlob GetCloudBlobReferenceAsSourceBlobForCopy(this CloudBlob originBlob)
        {
            UriBuilder uriBuilder = new UriBuilder();
            Uri srcUri = originBlob.Uri;
 
            // Encode the segment using UrlEncode
            string encodedBlobName = HttpUtility.UrlEncode(
                                        HttpUtility.UrlEncode(
                                            originBlob.Name));
 
            string firstPart = srcUri.OriginalString.Substring(
                0, srcUri.OriginalString.Length - Uri.EscapeUriString(originBlob.Name).Length);
            string encodedUrl = firstPart + encodedBlobName;
 
            return new CloudBlob(
                encodedUrl,
                originBlob.SnapshotTime,
                originBlob.ServiceClient);
        }

}

Here is how the above method can be used:

 // Create a blob by uploading data to it
CloudBlob someBlob = container.GetBlobReference("a+b.txt");
someBlob.UploadText("test");
 
CloudBlob destinationBlob = container.GetBlobReference("a+b(copy).txt");
                
// The below object should only be used when issuing a copy. Do not use sourceBlobForCopy to access the blob
CloudBlob sourceBlobForCopy = someBlob.GetCloudBlobReferenceAsSourceBlobForCopy();
destinationBlob.CopyFromBlob(sourceBlobForCopy);

We will update this blog once we have fixed the service. We apologize for any inconvenience that this may have caused.

Jean Ghanem

Comments

  • Anonymous
    November 01, 2013
    Is this still an issue or has it been fixed?  We are on Azure Storage Client 2.x and we still get 403 forbidden when a file has [ or ] in the name. Thanks,

  • Anonymous
    November 01, 2013
    @Rudy Can you confirm what version of .Net you are running. The 2.x clients do correctly encode the blob URI, however .Net 4.5 has an issue where URI no longer escapes the bracket characters which is causing an authentication failure. The client generated signature will now differ from the server as the server is using a .Net 4.0 URI to validate the request. We are actively working with the .Net team internally to resolve this issue, if running under .Net 4.0 is an option you can mitigate this issue by doing so.