Improving Performance by Batching Azure Table Storage Inserts
June 7, 2012 update: The Microsoft Windows Azure team has released a new Windows Azure SDK for PHP. This release is part of an effort to keep PHP client libraries up to date with new Windows Azure features and to make PHP a first-class citizen in Windows Azure. The latest client libraries are on GitHub: https://github.com/WindowsAzure/azure-sdk-for-php. While the SDK hosted on CodePlex will continue to work for the foreseeable future, it is strongly recommended that new PHP/Windows Azure application use the SDK hosted on GitHub.
The work done by Maarten Balliauw and other contributors in building the SDK hosted on CodePlex was critical in unifying the PHP developer experience for Windows Azure. The Windows Azure team is grateful to these contributors for their pioneering work and looks forward to their continued support (and yours!) in adding to the new SDK on GitHub.
Thanks,
The Windows Azure Team
This is a short post to share the results of a little investigation I did that was inspired by comments on a post I wrote about using SQL Azure for handling session data. The comment was by someone reporting that SQL Azure seemed to be faster than Azure Table Storage for handling session data. My experiments show that SQL Azure and Table Storage have very similar performance when doing single writes (YMMV), so I can’t verify or refute the claim. However, I got to wondering which is faster for inserting and retrieving many “rows” of data. I know that Table Storage is supposed to be faster, but I wondered how much faster. So I wrote a two-part PHP script that does the following:
- Connects to SQL Azure.
- Inserts 100 rows to an existing database.
- Retrieves the 100 rows.
Here’s the code:
$conn = sqlsrv_connect(SQLAZURE_SERVER_ID.".database.windows.net,1433", array("UID"=>SQLAZURE_USER."@".SQLAZURE_SERVER_ID , "PWD"=>SQLAZURE_PASSWORD , "Database"=>SQLAZURE_DB , "ReturnDatesAsStrings"=>true)); for($i = 0; $i < 100; $i++) { $id = $i; $data = "GolferMessage".$i; $params = array($id, $data); $stmt1 = sqlsrv_query($conn, "INSERT INTO Table_1 (id, data) VALUES (?,?)", $params); if($stmt1 === false) die(print_r(sqlsrv_errors())); } $stmt2 = sqlsrv_query($conn, "SELECT id, data, timestamp FROM Table_1"); while($row = sqlsrv_fetch_array($stmt2, SQLSRV_FETCH_ASSOC)) { }
Note: The code above uses the SQL Server Driver for PHP to connect to SQL Azure.
The second part of the script does the equivalent for Table Storage:
- Connects to Azure Storage.
- Inserts 100 entities to an existing table.
- Retrieves the 100 entities.
Here’s the code:
$batch = $tableStorageClient->startBatch(); for($i = 0; $i < 100; $i++) { $name = $i; $message = "GolferMessage".$i; $mbEntry = new MessageBoardEntry(); $mbEntry->golferName = $name; $mbEntry->golferMessage = $message; $tableStorageClient->insertEntity('MessageBoardEntry', $mbEntry); } $batch->commit(); $messages = $tableStorageClient->retrieveEntities("MessageBoardEntry", null, "MessageBoardEntry"); foreach($messages as $message) { }
Note: The code above uses the Windows Azure SDK for PHP to connect to Azure Storage.
The result of the test was that Table Storage was consistently 4 to 5 times faster than SQL Azure (again, YMMV). The key, however, was to use the $tableStorageClient->startBatch() and $batch->commit() methods with Table Storage. Without using batches, Table Storage opens and closes a new HTTP connection for each write, which results in slower performance than SQL Azure (which keeps a connection open for writes). When using batches with Table Storage, the connection is kept open for all writes.
Note: Many thanks to Maarten Balliauw who, when I was perplexed about the results of my tests without batching (I expected Table Storage to be faster, but because I didn’t know about batches for Table Storage, I was not getting the results I expected), suggested I try batching.
The complete script (with set up/tear down of database and Table) is attached in case you want to try for yourself.
Thanks.
-Brian
Comments
Anonymous
January 26, 2012
Ok, so now to do a more fair comparison, let's try to apply some batching to SQL Azure as well :)) You could start by wrapping those 100 INSERT operations into a BEGIN / COMMIT TRAN block, or using bulk copy API for example.Anonymous
January 27, 2012
@Silvano- That's a great point. At first glance, wrapping the 100 inserts in a single transaction provides only a very modest improvement. And, SQL Azure doesn't support BULK INSERT. But maybe there are other ways to make the comparison test case here more fair...I'll look into it. Thanks! -Brian