Share via


Virtualization Strategy: Backup and Restore choices

 

Discussion
High-Level Overview
Vendor-agnostic

(what are these boxes?)

In this article

This topic is built upon the assumptions taken at the following discussions. You should understand the terms and concepts discussed in those topics before proceeding further.

  • [[articles:Choice of Virtualization Strategy: General|Choice of Virtualization Strategy: General]];
  • [[articles:Backup and Restore: Classification of Scenarios|Backup and Restore: Classification of Scenarios]];
  • [[articles:Backup and Restore: Special Considerations|Backup and Restore: Special Considerations]].

Back to top

Guest-Level Backup

The most obvious option is to back up virtual machines (VMs) as if they were physical servers. This includes installing backup software (or agents) in Guest Operating Systems (OSes) and maintaining separate backup jobs (or even multiple backup jobs) per VM. The trade-offs of such approach are discussed in detail in [[articles:Choice of Virtualization Strategy: General|Choice of Virtualization Strategy: General]].

Restore process for a VM that was completely lost would include manual creation a new VM with blank virtual hard disk(s). Then you should follow the full disaster recovery procedure including booting the VM into restore environment and performing “Bare-Metal Recovery (BMR)”. For details on different recovery cases of virtualization-unaware backup and restore please see [[articles:Backup and Restore: Classification of Scenarios|Backup and Restore: Classification of Scenarios]].

Back to top

Host-Level Backup

Host-level backup does not necessarily mean backing up the host as a whole. It can be as granular as your backup and restore application allows it to be. But in common case a host-level backup means backing up a single virtual machine a time. This is discussed in more detail below at Host-Level Backup Scenarios.

Back to top

Host-Level Backup Modes

Virtualization-aware host-level backup and restore application usually provides both of the following two backup modes.

Back to top

Offline Host-Level Backup

This means backing up a VM that is turned off. This is a rather trivial task that includes backing up VM files just like any ordinary files from file system of the Host OS or management partition

Pros

  • does not require any support from backup and restore application other then for file system used by hypervisor.
  • does not require any special support from the hypervisor;
  • does not require any special support from Guest OS and applications;
  • can provide near bullet-proof reliability from data consistency and integrity perspective;
  • restored VM is always in “Clean Shutdown” state and can be brought to production immediately.

Cons

  • requires some mechanism to gracefully turn off VM for backup on schedule and then turning it back on;
  • requires significant downtime for the entire backup window (in case hardware snapshots or shadow copies are not available and/or not supported).

Back to top

Online Host-Level Backup

This is backing up running VM. It is a much more complex procedure that can be performed using two different approaches. Please note that depending on specific hypervisor and/or VM features (e.g.: “Pass-Through” virtual hard disk attached) this type of backup may be not possible.

1. Pausing VM.

Some hypervisors provide the capability of pausing a VM that briefly stops it execution transparently for the Guest OS and applications. Different hypervisors may use different terms (such as “Pause” or “Saved State”) for this mode. Paused VM does not use its files so it can be backed up easily just like in the previous scenario.

Pros Cons
  • requires very limited support from hypervisor (just the ability to pause VM);
  • does not require any special support from Guest OS and applications;
  • does not require the VM to be turned off for backup.
  • pausing by design is as transparent to Guest OS and applications as possible. Whatever takes place inside the VM has no clue about being paused or resumed. So no special pre-backup or post-restore actions can be performed. This means that this backup mode are never application-aware and cannot ensure data integrity and consistency;
  • most applications were designed assuming they would run on physical servers. In physical world there's no counterpart for pausing. (Hibernation or Sleep states looks somewhat similar but they are not completely the same because they is performed by the OS itself instead of hardware). So even if an application was explicitly told it is being paused or resumed it has nothing to do with this knowledge. This could lead to serious collisions, especially in case of application that highly relies on correct time data. For most of such applications pausing (regardless whether it is or is not done for backup ) is strictly prohibited and unsupported by both application and hypervisor vendors;
  • restored VM appears in paused or saved state that depending on hypervisor implementation may require discarding it before bringing the VM in production. This would result in dirty shutdown of Guest OS and applications and cause further data loss.

Due to above mentioned limitations this approach should be treated as the last resort and used only in case all other options discussed in this article are not available.

2. Using Snapshots or Shadow Copies.

This is the most advanced scenario.

Pros Cons
  • can ensure data consistency and integrity (see special requirements below);
  • requires no downtime.
  • requires special application support (see below);
  • restored VM appears in paused or saved state that depending on hypervisor implementation may require discarding it before bringing the VM in production. This would result in dirty shutdown of Guest OS and applications. Though in-guest snapshot or shadow copy support used in this scenario guarantees no data loss.

To take advantage of this approach you need special support from the following parties:

  • Guest applications. For Windows software that means providing a [[VSS Writer]];
  • Guest OS. This includes making Guest OS virtualization-aware by installing specialized services or tools from hypervisor vendor. For Windows guests these services or tools should include a [[VSS Requestor]] and Windows version itself should support [[Volume Shadow Service (VSS)]];
  • Hypervisor;
  • Backup application. For [[Hyper-V]] that means that backup application should implement a [[VSS Requestor]].

Closer, the backup data should pass three stages down through the stack.

  1. From Guest application running to Guest OS. At this point you need to take special measures to ensure data consistency and integrity for running applications and to trigger some application-specific pre-backup or post-restore actions as discussed at [[articles:Backup and Restore: Special Considerations|Backup and Restore: Special Considerations]]. Namely,
    • for Windows-based VSS-capable Guest OSes this is done by calling application-specific [[VSS Writer|VSS Writers]]. Note that no snapshot is actually taken on this step by in-guest mechanisms. VSS infrastructure is used solely for the abovementioned purposes;
    • for non-Windows Guest OSes (or non VSS-capable Windows versions) you need to use some other methods or no data integrity can be guaranteed.
  2. From Hypervisor to snapshot. Since in the time of backing up the VM is running, its files are in use by the hypervisor. So on this step data should be transferred using some snapshot mechanism (possibly different from used on the previous step). From the Guest OS perspective this always looks like hardware snapshot. This means that in-guest VSS Provider is not involved in actually taking and transferring this snapshot. Though as described on previous step, in-guest VSS Writers are called prior to making this snapshot (or immediately after restore operation).
    • In case of Hyper-V, snapshot of VM files is created using [[Volume Shadow Service (VSS)]] infrastructure of Parent Partition employing [[Hyper-V VSS Writer]]. Note that this is another VSS operation that is not directly related to what took place in the Guest OS on previous step (though these two operations are the stages in the same chain).
    • In case of third-party hypervisor, the snapshot is created using its internal mechanism.
    • Both Hyper-V and third-party hypervisors may use their inbox snapshots mechanisms (like [[VSS Software Provider]]) or orchestrate hardware snapshots capabilities provided by storage vendors. In case of Hyper-V that requires installing a vendor-provided software component known as [[VSS Hardware Provider]] into parent partition.
  3. From snapshot to backup application.
    • In case of Hyper-V, the backup application (providing [[VSS Requestor]]) runs in Parent Partition which is the same place where the snapshot is created by [[VSS Provider]] (either hardware or software). So the backup application can access the snapshot immediately on creation and no more special actions take on this step.
    • In case of third-party hypervisor, backup application may run in host OS, management partition or another dedicated environment (either physical or located in specialized VM). In the later scenario the snapshot of VM files that was created by Hypervisor (or hardware snapshot mechanism) needs be mounted (or somewhat transferred) to that backup and restore environment.

During restore procedure data has to pass the same stack in reverse direction.

Back to top 

Host-level Backup and Restore Scenarios

Depending on your backup and restore application one or several of the following scenarios may be possible. A good backup and restore application can combine several of these restore scenarios with only one copy of data being backed up.

Back to top

Virtual Machine Configuration

VM configuration* *is usually stored in the host OS (or management partition) separately of Virtual Machine data (that is usually located in Virtual Hard Disk files). Virtual machine configuration is something Guest OS has no clue about. So backing it up is one of the key advantages of host-level backup and restore over guest-level one. You may think of two different scenarios when you can take advantage of this type of backup and restore.

Back to top

Separate Virtual Hard Disk(s)

If your virtual machine has several virtual hard disks attached you can decide backing up and restoring some of them independently of the others. This can be helpful in the following scenarios.

  • one or several of virtual hard disk are protected using guest-level backup and restore application (e.g. when host-level backup and restore is not an option due to limitations such as with “Pass-through” disks);

  • backup and restore only data volume if backup media is tightly limited and reinstalling Guest OS does not seem like a big problem to you.

Back to top

Whole Virtual Machine

This combines two previous options (preferable included into a single operation instead of separate ones) and may be helpful in the following scenarios.

  • Disaster Recovery. In case of the whole VM being lost (possibly even with its virtualization host) you don't need to re-create the VM manually (as opposite to the case of guest-level backup and restore). You can recover the VM itself with exactly the same properties and configuration settings it had at the time of backup;

  • creating exact copy of VM when required for testing or development scenarios. This scenario is sometimes referred as “Alternate Location Recovery (ALR)”.

Back to top

Individual Files from inside Virtual Machine

This type of backup is usually performed in the following steps.

  1. Virtual hard disk file, or its snapshot in case of online backup, is mounted to backup and restore environment where backup and restore application resides. Depending on specific software this could be management partition of the virtualization host, dedicated physical server or even another Virtual Machine.
  2. The backup and restore application gains access to the data inside the virtual hard disk and can back up whatever it needs to. At this point the backup and restore application (or the Operating System where it is installed) needs to support the file system used inside the virtual machine.

Snapshots are made read-only by definition so they cannot be used to “Inject” restored data into running VM. So only the following restore scenarios may be supported for individual files.

  • Offline restore when data is copied to virtual hard disk belonging to a Virtual Machine that is turned off.

  • Alternate Location Recovery (ALR) when data is copied to non-original destination where backup and restore application has write access (either directly or through its agents).

Back to top

Creating Snapshot or Checkpoint

Most hypervisors provide features like “Snapshot” or “Checkpoint”. Some people think that this can be used as a kind of backup. Please note that this is not the same as using snapshots or shadow copies for the means of correct Host-Level Online Backup.

Most vendors do not promote “Snapshot” or “Checkpoint” features as backup replacement or alternative. There are several reasons for it.

  • In most hypervisors snapshots are tightly bound to the VM itself. So they cannot be separately transferred, copied or restored. This makes it useless to treat snapshots as backups. If you still have your VM you have no need to restore it. When the VM itself is lost you cannot use snapshots.
  • Snapshots by design are as transparent to Guest OS and applications as possible. Whatever takes place inside the VM has no clue about snapshot being taken or reverted. So no special pre-backup or post-restore actions can be performed. This means that snapshots are never application-aware and cannot ensure data integrity and consistency.
  • Most applications were designed assuming they would run on physical servers. In physical world there's no counterpart for snapshots. So even if an application was explicitly told about a snapshot being taken or reverted it has nothing to do with this knowledge. This could lead to serious collisions, especially in case of application that maintain separate copies of its data and these copies become out of sync after reverting to snapshot. For most of such applications (e.g. [[Active Directory Domain Services (AD DS)]]) using snapshots (regardless whether they are or are not treated as backup replacement) is strictly prohibited and unsupported by both application and hypervisor vendors.

Due to abovementioned limitations you should never use “Snapshot” or “Checkpoint” features instead of regular backups performed using other supported mechanisms discussed in this article. Though, “Snapshot” or “Checkpoint” is a great feature for development and test environments especially with server roles that do not require application awareness.

Back to top