SQL Server Database Backup in Azure IaaS: performance tests….
In this blog post, I want to show you some performance test results obtained tweaking many different configuration options and playing with typical SQL Server native backup parameters and then checking the impact on the backup completion time and throughput.
One key scenario tested is the usage of a new feature introduced with the recent release of SQL Server 2012 SP1 Cumulative Update 2 and later versions (including SQL Server 2014):
SQL Server 2012 SP1 CU2
http://support.microsoft.com/kb/2792921
Here, Microsoft introduced an interesting possibility for executing SQL Server native backups using Azure Blob Storage. This feature can be used on either on-premise or cloud environment using Windows Azure Virtual Machines, but it comes with some important limitations that must be considered:
- Backup to multiple blobs in a single backup operation is not supported, then you can only use one;
- Due to the maximum blob size in Azure and the limitation above, the maximum backup size supported is 1 TB;
- In this implementation, you can issue backup or restore statements by using TSQL, SMO or PowerShell;
- Appending to existing backup blobs is not supported. Backups to an existing Blob can only be overwritten by using the WITH FORMAT option;
- Specifying a block size with BACKUP is not supported;
- Specifying MAXTRANSFERSIZE is not supported;
It is worth mentioning that the SQL Server Database Engine is not aware of Azure Blob Storage, an external EXE component called “BackupToUrl.exe”, located in “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn”, is invoked as you can see in
the below ERRORLOG extract:
BackupToUrl process initiated with PID: 2680, for database name
[TestBackupDB]
Database backed up.
Database: TestBackupDB, creation
date(time): 2013/05/31(15:58:39), pages dumped: 3951905, first LSN:
317:260174:37, last LSN: 317:260191:1, number of dump devices: 1, device
information: (FILE=1, TYPE=URL:
{'http://igorstorage5.blob.core.windows.net/backup/TestBackupDB-3-5.bak'}).
You can easily guess that this EXE has been compiled with necessary APIs to deal with Azure Blobs. For more information on this new feature, see the link below:
Tutorial: Getting Started with
SQL Server Backup and Restore to Windows Azure Blob Storage Service
http://msdn.microsoft.com/en-us/library/jj720558.aspx
NOTE: This functionality is not available in SQL Server Management Studio UI, TSQL BACKUP statement must be explicitly used, or the Power Shell module for SQL Server.
When evaluating backup strategies, involving direct backup to Azure Blog storage (using URL) or using additional VM-attached data disks, it’s important to remember that there are differences in how the I/O is involved:
- Direct backup to Azure Blog storage (using URL): in this case, it’s pure network traffic that the Guest VM will send over targeting Azure Blob storage, then it will be counted against the max network bandwidth cap imposed to the VM;
- Additional VM-attached data disks: in this case, traffic is still against Azure Blob storage, but this time the Guest NIC is not involved, instead the Virtual Disk Driver will be used (see picture below) and this traffic will not count toward the VM network quota;
This is the list of parameters and configuration items I checked in my performance tests:
- Backup media type
- Azure VHD data disk (DISK)
- Azure Blob (URL)
- Backup media location
- Same datacenter, remote datacenter in the same region, remote datacenter in a remote region;
- Number of Azure backup media targets (VHD or Blob)
- 1,2, 4, 8
- Azure write cache for backup media targets
- Enabled or Disabled
- BUFFERCOUNT parameter values
- 13(default), 25, 50, 100
- MAXTRANSFERSIZE parameter values
- 1MB (default), 64KB, 2MB, 4MB
NOTE: Due to the high dependency of backup compression performances on the specific database used, it has been not considered in these tests; obviously, SQL Server Backup Compression is highly recommended to reduce backup time.
For more information on BUFFERCOUNT and MAXTRANSFERSIZE parameters and on-premise optimization, see the following two articles from the SQLCAT team:
Tuning the Performance of Backup Compression in SQL Server 2008
Tuning Backup Compression Part 2
http://sqlcat.com/sqlcat/b/technicalnotes/archive/2009/02/16/tuning-backup-compression-part-2.aspx
For more information on how to optimally configure SQL Server in Azure IaaS, see the white-paper below by the Azure CAT Team:
Performance Guidance for SQL Server in Windows Azure Virtual Machines
Test Configuration & Notes:
- Azure VM size “XL”;
- Operating System: Windows Server 2008 R2 SP1 (x64);
- SQL Server 2012 Enterprise Edition (x64) has been used, along with “Service Pack 1” and Cumulative Update 2”;
- As sample, I used a real production database with the following characteristics and layout:
- 4 data disks (10GB each) with cache disabled;
- 4 data files, PRIMARY filegroup, one for each data disk;
- Default 4KB NTFS allocation unit;
- 30GB database data size;
- SIMPLE Recovery Model;
- No OLTP activity on the database during backup;
- No Data or Backup compression;
- For each test, a minimum number of 25 runs has been used;
- Between each test, 60 minutes idle interval has been observed, then a ramp-up 30 minutes period has been used to warm-up target backup disks;
- Analysis of the results has been conducted with a 95% percentile;
- Each test type has been repeated during normal business hours, in the evening and during the weekend to ensure stability of the results;
- In SQL Server 2012, given the database layout used for these tests, the default value for BUFFERCOUNT is (13) and MAXTRANSFERSIZE is (1MB); in order to have more details on backup internals, it’s possible to use the following trace flags and obtain an output, from SQL Server ERRORLOG, similar to the one below:
Trace Flag 3004: outputs what operations Backup and Restore are performing;
Trace Flag 3014: outputs additional information about Backup and File operations;
Trace Flag 3213: outputs the Backup Buffer configuration information;
Trace Flag 3605: required to force the output into the error log;
2013-06-01 07:35:39.04 spid51 Backup: DBReaderCount = 4
2013-06-01 07:35:38.53 spid51 Memory limit: 895MB
2013-06-01 07:35:38.53 spid51 BufferCount: 13
2013-06-01 07:35:38.53 spid51 Sets Of Buffers: 1
2013-06-01 07:35:38.53 spid51 MaxTransferSize: 1024 KB
2013-06-01 07:35:38.53 spid51 Min MaxTransferSize: 64 KB
2013-06-01 07:35:38.53 spid51 Total buffer space: 13 MB
2013-06-01 07:35:38.53 spid51 Tabular data device count: 4
2013-06-01 07:35:38.53 spid51 Fulltext data device count: 0
2013-06-01 07:35:38.53 spid51 Filestream device count: 0
2013-06-01 07:35:38.53 spid51 TXF device count: 0
2013-06-01 07:35:38.53 spid51 Filesystem i/o alignment: 512
2013-06-01 07:35:38.53 spid51 Media Buffer count: 13
2013-06-01 07:35:38.53 spid51 Media Buffer size: 1024KB
Test results:
Media Type |
# Backup Disks |
Write Cache |
BUFFERCOUNT |
MAXTRANSFERSIZE (KB) |
Backup Storage Datacenter Location |
Backup Throughput (MB/sec.) |
DISK |
1 |
OFF |
13 |
1024 |
LOCAL |
90 - 95 |
DISK |
1 |
ON |
13 |
1024 |
LOCAL |
127 - 138 |
DISK |
1 |
OFF |
25 |
1024 |
LOCAL |
100 - 110 |
DISK |
1 |
OFF |
50 |
1024 |
LOCAL |
100 - 121 |
DISK |
1 |
OFF |
100 |
1024 |
LOCAL |
102 - 132 |
DISK |
1 |
OFF |
13 |
64 |
LOCAL |
18 - 26 |
DISK |
1 |
OFF |
13 |
2048 |
LOCAL |
45 - 105 |
DISK |
1 |
OFF |
13 |
4096 |
LOCAL |
65 - 128 |
DISK |
1 |
OFF |
100 |
4096 |
LOCAL |
135 - 146 |
DISK |
1 |
ON |
100 |
4096 |
LOCAL |
97 - 109 |
URL |
1 |
OFF |
13 |
1024 |
LOCAL |
110 - 112 |
URL |
1 |
OFF |
13 |
1024 |
REMOTE DC |
102 - 109 |
URL |
1 |
OFF |
13 |
1024 |
REMOTE RG |
101 - 108 |
DISK |
2 |
OFF |
13 |
1024 |
LOCAL |
90 - 128 |
DISK |
2 |
OFF |
100 |
4096 |
LOCAL |
143 - 153 |
DISK |
4 |
OFF |
13 |
1024 |
LOCAL |
97 – 166 |
DISK |
4 |
OFF |
13 |
4096 |
LOCAL |
160 – 167 |
DISK |
8 |
OFF |
13 |
1024 |
LOCAL |
114 – 173 |
DISK |
8 |
OFF |
100 |
4096 |
LOCAL |
164 – 170 |
Test results considerations:
- Higher variations in performances during the weekend have been observed, then don’t assume this is the best time window for running backups;
- As in the nature of a true multi-tenant platform, results may vary based on workload in Azure infrastructure, then it’s recommended to run your own tests over a wide spectrum of days and hours;
- No effect on backup performances by Virtual Machines memory and SQL Server buffer pool status;
- Changing VM size didn’t impact backup performance results;
Azure disks warm-up time observed between 10-20 minutes;
- Azure disks returning “cold” after about 60 minutes idle;
- Adopting a “warm-up” strategy for target backup disks may greatly enhance performances;
- Enabling write cache, on the target backup disks, returned in some cases sensible performance benefits, but at the price of no constant results and very high variation: it’s then recommended to do not use it, especially if your database size is higher than 10-20GB;
SQL Server Backup throughput is essentially capped by read performances on source data files, write throughput is much more higher (effect of how Azure storage works with Append-Only writes?);
- Best performances obtained with the best combination of backup parameters (without considering backup compression) are 40-45MB/sec reading from data file, and 100-110 MB/sec writing to target backup file on different Azure data disks;
- Traditional TSQL BACKUP statement optimization parameters BUFFERCOUNT and MAXTRANSFERSIZE can provide sensible benefits, until reaching the maximum throughputs mentioned below;
Using SQL backup striping over multiple target disks can provide significant better performances, recommended ratio is no more than 1 target disk for each 2 source files;
- Since the number of additional disks that a VM can mount is limited, using multiple disks for backup purposes must be carefully assessed and considered; if possible, backup to direct Azure Blob storage URL should be used if performance targets are suitable for the specific scenarios and requirements;
Direct backup against Azure URL is consistently 15% more efficient than backup against Azure data disks;
- This can be easily explained by the different paths the IO will take when using attached VM disks versus direct URL backup managed by SQL Server itself;
- Network performances for backups over remote datacenters are excellent: backup to remote datacenter in the same region (West-Europe <<-->>North-Europe) and in remote region (North-Europe <<-->> East-US) generated almost the same results (110MB/sec, 109MB/sec, 108MB/sec);
- NOTE: I was very surprised about these results, that’s why I executed multiple times for a total of more than 50 runs;
- Be aware that backing up against a remote datacenter will raise costs due to outgoing traffic from source Azure Datacenter where the VM resides;