Exchange 2007 - Continuous Replications and Exchange Backups

In Microsoft Exchange Server 2007, continuous replication, also known as log shipping, is the process of automating the replication of closed transaction log files from a production storage group (called the "active" storage group) to a copy of that storage group (called the "passive" storage group) that is located on a second set of disks (Local Continuous Replication, or LCR) or on another server altogether (Cluster Continuous Replication, or CCR). Once copied to the second location, the log files are then replayed into the passive copy of the database, thereby keeping the storage groups in sync with a slight time lag.

In simple terms, log shipping follows these steps at the storage group level, with each storage group containing a maximum of one database:

  • Seed the passive copy database directory with a current copy of the active database.
  • When there is a new log file in the active copy log directory, copy it to the passive copy log directory.
  • Replay the log file from the passive copy log directory into the passive database.

One of the benefts of using continuous replication is the ability to offload Volume ShadowCopy Service (VSS)-based backups from the active storage groups to the passive storage groups. Exchange-aware VSS backups are supported for both the active and passive storage groups and databases. The passive copy backup solution we provide is VSS-only, and its implemented by the Exchange Replica VSS Writer that is part of the Replication service. Streaming backups are only supported from the active storage groups. You cannot use streaming backup APIs to backup the database on the passive side. You also need to use a third-party backup application that supports Exchange VSS, as NT Backup is not Exchange VSS-aware.

When you're making a VSS backup off the passive copy, what happens to the transaction logs?  A common task during Exchange-aware backups is the truncation of transaction log files after the backup has completed successfully. The replication feature in Exchange 2007 guarantees that logs that have not been replicated are not deleted.

The challenges with taking a backup on the passive is that backups modify the header of the database. For example, they add information about the time of the last backup of the database. The VSS backup is made by possible by the Exchange Replica VSS Writer in the Replication service, and the Replication service has a copy of the data, but it can only get its data and modifications from the store. It can't independently modify its copy of the database; that would produce divergence. Therefore, it can't modify the header of its database copy.

The solution is to have the Replication service coordinate its backups with the store. As soon as you start a backup on the passive, the Replication service contacts the store on the active and tells it that a backup is about to start. This is done to prevent the same storage group on both the active and the passive from being backed up simultaneously. Once the backup is finished, the Replication service contacts the store and lets it know that the backup completed.

The database header modifications resulting from the backup are then made by the store on the active. This action generates a log record, which through continuous replication is copied to the passive. When it is replayed, the database header on the passive is then updated.

This is a little more complex than traditional backups. And it has some interesting side effects. For example, if you backup the passive and then immediately after the backup has finished you look at the database on the passive, it will not reflect the backup. The database on the active node, will however, reflect it.

So if you are backing up databases in a continuous replication environment, looking at the database on the active is the most accurate way to determine what the last backup is.

Another side effect is that, if the store is not running, you can't backup the passive. Running the store is required so that backups can be coordinated and so the database header can be updated.

With log files being copied around and required by the Replication service, it becomes a little more complicated when it comes to getting rid of them. Right now the conventional way to get rid of log files is to run a backup. Backups runs, and on successful completion, it deletes the logs you don't need any more.

The challenge now is that the definition of "need" is different because now it takes into account the state of replication. If a log file has not been copied, then you still need it (even though the store might not need it). So now a log file should not be deleted until (1) it isn't needed for crash recovery, (2) it has been replayed on the passive, and (3) it has been backed up.

To coordinate all of this, whenever the Replication service finishes a replay, it contacts the store and says that it replayed storage group X up to Y generation number. At that point, the store knows that log files up to that generation number are no longer needed by the Replication service. It can then analyze the state of the last backup and the state of crash recovery and work out which log files are no longer needed on the active.

Fortunately, on the passive, things are a lot simpler. The passive can analyze its own log files and determine which ones are needed for recovery, and which ones are needed for backup.