Large SQL Server database backup on an Azure VM and archiving

Applicable VersionsSQL Server 2014 and lower versions.

Background

Azure Virtual Machines (VM) allows the capability to install SQL Server and lift/shift an application or data warehouse to the cloud. The largest disk size Azure currently allows is 1tb (or more accurately 1023GB). Larger VMs such as A9 and D14 allows 16 and 32 of these disks to be attached respectively. For a large SQL Server database on an Azure VM, backing up the data can have some challenges. Some challenges for a backup can be scheduled VM reboot, unplanned reboot or VM migration because of faulty OS which are sometimes unavoidable but not very common occurrences and thus out of scope for this blog. We will focus on two particular challenges for backing up large SQL Server database on an Azure VM – size and speed.

SQL Server allows the option to backup to disk and URL. Currently, URL backup to Azure Storage is limited to 1 file only and also that file cannot exceed more than 1TB. This is a good option for backing up small databases.

Backup to disk would certainly have to be smaller than 1TB per disk unless the disks are configured as a storage pool. In a case where backup will not fit on the disks (16 disks on A9 and 32 disks on D14) we would recommend looking at SQL Shep’s blog. That blog describes how to create DFS file storage in Azure which is not a small undertaking. Thus, planning for backup space has to be one of the factors when moving large SQL Server database in Azure. If the data files fit on 16 disks but the backup will not you may want to think about D14 VMs which allows attaching up to 32x1TB disks.

Another size related challenge comes from the requirement of archiving data to Azure Blob Storage. Currently, each blob cannot exceed 200GB in Azure Storage. Another option of archiving backup is to use Azure File Service. At the time of the writing of this blog, Azure File Service is still under preview. Azure File Service allow you to mount a 5TB drive through SMB protocol as a network drive on an Azure VM. Azure File Service also has the limit where 1 file cannot be greater than 1TB. But multiple Azure Files can be connected to one VM and used to copy/archive backups. These file share(s) can also be mounted on many VMs and can be accessed using a REST APIs.

When backing up SQL Server on-premises we usually have large storage pool or SAN connected servers. Backing up using one backup file on-premises is not uncommon and works fine. In Azure VMs we can also use a storage pool to create a backup with one backup file. But since space on VHDs are limited we need to plan for archiving the data onto Azure Blob Storages. Moving one large backup file to Azure Storage can become impossible because of the size restriction. Azure file Service allow storing of larger files but having one large backup file can prove challenging when copying.

Our recommendation would be to backup to multiple files for even a storage pool rather than one large file. In this case let’s assume there are ten drives mapped, H through Q. The backup command would look something like the following:

BACKUP DATABASE tpch_10tb
TO DISK = 'H:backuptpch_10tb_01.bak',
DISK = 'I:backuptpch_10tb_02.bak',
DISK = 'J:backuptpch_10tb_03.bak',
DISK = 'K:backuptpch_10tb_04.bak',
DISK = 'L:backuptpch_10tb_05.bak',
DISK = 'M:backuptpch_10tb_06.bak',
DISK = 'N:backuptpch_10tb_07.bak',
DISK = 'O:backuptpch_10tb_08.bak',
DISK = 'P:backuptpch_10tb_09.bak',
DISK = 'Q:backuptpch_10tb_10.bak'
WITH COMPRESSION,
STATS = 5;
GO

SQL Server allows compression when creating a backup. In certain customer cases we have seem a 10TB database compressed into 2TB or smaller in backup files. When Transparent Data Encryption (TDE) is used in the database the backup of that database does not compress. As such an important recommendation is to compress the backup to keep the size of the backup smaller unless TDE is used.

The reasons behind having many compressed backup files vs. one large file are the various size limits discussed earlier and the practicality of data movement in the cloud. Let's assume the storage pool or drives are rather full after the backup, we need to move the backup somewhere else to make space for the next backup. In this case Azure Blob Storage is a good candidate to copy the backup and free up the disks. When a storage account is created on azure, the user has option to choose if the disk will be locally redundant (LRS) or geo redundant storage (GRS). We will leave it up to your business decision as to what type of redundancy to choose.

As 200GB is the current limit of a single blob, you have to calculate to keep the files under that limit. If the database has multiple files or filegroups, a quick way to check would be to backup a file or filegroup to see the size and estimate the backup size from there. SQL Server does a good job at keeping the backup files the same size when multiple files are used.

Let’s assume our backup is divided into 10 files that doesn’t exceed 200GB limit for each file. We can now write 10 copy command to parallelize the copy operation to archive the backup to Azure Storage. We will also need to make sure all files are copied correctly or else the backup is useless when it comes time to restore the data for whatever reason.

We have run the following backup scenarios and captured timing:

Database

Size

Database Compression

Timing (hh:mm:ss)

TPCH_10GB

10GB

Uncompressed

00:02:18

TPCH_1TB

1TB

Compressed

02:29:00

TPCH_10TB

10TB

Compressed

21:16:00

In our case, we have observed that when all the tables in a database is compressed it takes a lot less time to backup that database. In all the above tests we used a MAXTRANSFERSIZE of 4MB. Changing BUFFERCOUNT option didn’t make much difference. The backup command looks like the following for 10TB database.

BACKUP DATABASE tpch_10tb
TO DISK = 'H:backuptpch_10tb_01.bak',
DISK = 'I:backuptpch_10tb_02.bak',
DISK = 'J:backuptpch_10tb_03.bak',
DISK = 'K:backuptpch_10tb_04.bak',
DISK = 'L:backuptpch_10tb_05.bak',
DISK = 'M:backuptpch_10tb_06.bak',
DISK = 'N:backuptpch_10tb_07.bak',
DISK = 'O:backuptpch_10tb_08.bak',
DISK = 'P:backuptpch_10tb_09.bak',
DISK = 'Q:backuptpch_10tb_10.bak'
WITH COMPRESSION,
MAXTRANSFERSIZE= 4194304,
STATS = 5;
GO

Archiving

Azcopy does a good job at copying data in Azure. The following Azcopy command will copy the backup files from the VM to a storage of your choice.

This first command will copy all the files one after the other:

AzCopy /Source:X:tpchbackup /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/ /Destkey:<destination storage key> /Pattern:*.bak

This command will copy tpch_1tb_01.bak file:

AzCopy /Source:X:tpchbackup /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/ /Destkey:<destination storage key> /Pattern:*_01.bak

In the latter case, we can execute multiple of these commands to finish copying faster. We would recommend adding a date component to the directory structure to make the files unique so the older backup are not overwritten and also many versions of the backup can reside side by side but also can be uniquely identified. In that case the copy command will look as following:

AzCopy /Source:X:tpchbackup /Dest:https://tpch1tbbackup.blob.core.windows.net/backup/20150101 /Destkey:<destination storage key> /Pattern:*.bak

Azure File Service is another option to copy the backups for archival purposes. Use the following commands to mount a network drive to archive the backup:

1. Create a storage account in Windows Azure PowerShell

New-AzureStorageAccount -StorageAccountName "tpch1tbbackup" -Location "West US"

2. Create a storage context

$storageAccountName = "tpch1tbbackup"

$storageKey = Get-AzureStorageKey $storageAccountName | %{$_.Primary}

$context = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageKey

3. Create a share

New-AzureStorageShare -Name backup -Context $context

4. Attach the share to your Azure VM

net use x: \tpch1tbbackup.file.core.windows.netbackup /u:$storageAccountName $storageKey

5. Xcopy backup to x: to offload the backup on the mounted drive

xcopy D:backup*.* X:tpchbackup*.*

This begs the question why not just use the UNC path of the mounted drive to do the backup. In which the answer is, it's not possible at this time. Notice that the file is mounted using a storage account name and storage key. If you try to backup to the UNC path of the drive directly SQL Server will throw the following error:

Operating system error 1326(The user name or password is incorrect.).

If you are using multiple Azure File shares for archiving backup you should also have some kind of accounting (metadata) that tells you where each backup files of a backup set reside.

Lastly, you should follow the same procedure for backup and restore that you follow for your on-premises database. As an example, if your business procedure is to check the newly created backup using ‘restore verifyonly’, follow that before you move your data for archiving.

Comments

  • Anonymous
    February 27, 2015
    Boy, do all these artificial Azure limitations s*ck. I'm sure the architecture of blob storage doesn't stop working for files larger than 1TB. That limit is artificial. So are pretty much all the limits listed in this blog post. These limits are simply integers in some config file. You're wasting value by restricting valid and useful use cases. Charge us for it but at least enable us to do interesting things (such as backing up more than 1TB in an easy way). These restrictions also destroy the simplicity value proposition of the cloud. We now have to back up to 10 drives, then copy to blob using 10 parallel copy commands. Takes a lot of brittle scripting to do that. Instead I should just be able to back up to a blob URL. This should be simple stuff.