Azure blob storage and effective use of cache control property
Oops! I never noticed that blob CacheControl property so important. I went through extensively with respect to behaviour of blob storage based on value set in CacheControl property and I thought it is worth sharing here. I am not sure how many of you already know this but I thought it is good to start with my first blog on Technet wiki. Being the first post here I am sure there will be mistakes. So your suggestions/ comments/ feedback are very important to improve this article. Alright so here we go!!
Azure storage pricing states that, you are charged for number of read and write operations and ingress (incoming data to azure storage) is free however, outgress(outgoing data from azure storage) is charged. Therefore we should try to reduce the number of calls to azure as possible as we can and this is where CacheControl property can play major role.
First thing is to download Fiddler –http://www.telerik.com/fiddler. This will help to understand network traffic and HTTP response codes.
Initial Steps –
To test this I first uploaded an image file [the image is actually a click from my camera only ] of size 4MB. I insist on uploading big file of at least 4MB and above. This will help to understand and observe the difference in loading time when CacheControl property is set.
To upload the blob file you have various options however, make sure that you should get an option to upload the file to blob storage with CacheControl property.
I used the sample code that I had created to perform blob upload in parallel and in async. You can also use the same and here is the reference link - http://sanganakauthority.blogspot.in/2014/07/upload-large-files-to-azure-block-blob.html
If you go to the bottom you find the download link of the sample code.
What is CacheControl property?
CacheControl is a header supported by HTTP 1.1 protocol and it can have any of the following values like Private, Public, No-Cache and No-store.
Setting CacheControl header allows you store your resources on visitor’s browser. So you essentially cut down the server request, hence faster browsing experience. Also you cut down prices in case of platforms like Azure as you are bringing down the server calls. So in this post we will see various outcomes of CacheControl property on Azure blob.
Refer to the more details here - http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
And one more useful link - http://msdn.microsoft.com/en-us/library/ms524721%28v=vs.90%29.aspx
Scenario 1 –
Create a public container and upload the blob file with settings as following code -
//set cache control property for the uploaded blob
blockBlob.Properties.CacheControl = "max-age=300, must-revalidate";//5 minutes of caching on client side blockBlob.SetProperties();
“Must-revalidate” is telling both client caches and proxy caches that once the content is stale (older than 300 seconds) they must revalidate at the original server before they can serve the content again to the client.
Ok, I have uploaded the blob to Azure storage and I have cache control property as defined above. Now just to make sure if the cache control has been really set, I opened properties from Visual Studio and observed that, yes it has been set.
The blob storage URL is “https:// mystorage.blob.core.windows.net/mycontainer/blob4”. Therefore we need to setup the fiddler options in such a way that it captures the https traffic as well. Go to Tools->Fiddler Option -> Https tab and mark the checkboxes as shown below –
Now I clicked on Browse option as shown below and accessed blob URL through the browser opened.
If you see the below fiddler output you will observe that first output is 200. Means the status is OK.
Now I refresh the page and this is where CacheControl property plays its role. The fiddler output shows the status as 304.
What is 304 status?
304 – Http status means Not Modified.
Indicates that the resource has not been modified. This means that there is no need to retransmit the resource as client has previously downloaded copy. [It means the copy is cached at client’s browser].
The fiddler also gives similar kind of information –
And this is the screenshot of http response official site -
If the cached time period of 5 minutes is elapsed and if I again refresh it then I get 200 response at start and on subsequent refresh I get 304 responses. This means my Image is not getting downloaded every time and hence my storage calls reduced and I SAVE MONEY.
Scenario 2 –
Let’s remove the previous response details from Fiddler. To do this select Remove option from toolbar of fiddler.
Now, I just don’t set the CacheControl property at all in my code. I removed the above lines of code of CacheControl setting and performed the upload of blob storage. So I see the properties as shown below –
This means when I access the blob from my browser, it should not have cached it and I should never get 304 status codes. However, when I accessed the blob link in browser and refreshed multiple times, the fiddler output as shown below –
So if you see I am still getting 304 status codes even I did not set any CacheControl on blob. So my blob is still getting cached on my browser. How that is happening?
Well this happened because, when you don’t set CacheControl bydefault it is set to Private Private - A cache mechanism may cache this page in a private cache and resend it only to a single client. This is the default value. Most proxy servers will not cache pages with this setting and Private settings value is “no-cache”.
No-cache means this response MUST NOT be reused without successful revalidation with origin server.
So as stated above, it may not work always and hence NOT setting CacheControl property is not a reliable option for removal of blob caching on client side. So let’s see how we can remove the caching with scenario 3.
Scenario 3 –
Now in this scenario I will try to set up CacheControl in such a way that, the output will not be cached on client browser. In this case every time you refresh the page, a request will be made to server to download.
blockBlob.Properties.CacheControl = "private, max-age=0, no-cache, no-store";
blockBlob.SetProperties();
This code specified that no caching should occur. Now if I again access the blob url from browser then below is the output of fiddler –
If you observe, there is no 304 response. Means every request is made to server to download the image.
Of course this is not good practice to remove cache control completely.
Scenario 4 –
Now in this I am going to change the contents of blob storage and observe the response in fiddler. So first I uploaded the blob with CacheControl settings to 300 seconds and then accessed the blob from browser. As expected the first response was 200 and subsequent responses included 304 statuses. After this I change the blob content to some new image, and refreshed my browser. At the same moment I got response as 201 (created) and then 200 and back to 304 with new image.
Alright here I end all the scenarios that I tried. If you create a web role and have an Image control on web form and assign the ImageUrl property to blob storage. Then if you run the fiddler you will observe the same behaviour as 200 and then 304 status codes. Hence whenever you are creating any type of blob be it, image, excel, pdf, css or anything you should always set CacheControl property as per the requirement of application.
So as a best practice, you should always set CacheControl property to some value in seconds with “must revalidate” option so that you will get the maximum performance and you will save money with respect to data transfers in blob storage.
Hope this article gives you proper insights related to CacheControl property with respect to Azure Blob storage.
Important – Please suggest your Feedback/ Changes / Comments to the article to improve it.
Cheers…
Happy Caching!!