다음을 통해 공유


All about RollbackSnapshotTempDB...

 

I’ve been recently involved in several cases where Databases named RollbackSnapshotTempDB +<someGUID> were generating confusion. The purpose of this post is to clarify their origin and use and to enable SQL Server admins to know what to do if they need to deal with them. The following topics will be discussed (click for a quick jump) :

 

· RollbackSnapshotTempDB : why such a name ?

· RollbackSnapshotTempDB under the hood.

· But I see permanent RollbackSnapshotTempDB on my SQL Server instance ?

· Why do we have leftover RollbackSnapshotTempDB ?

· I have leftover RollbackSnapshotTempDB, What should I do ?

· Should my backup application really request ‘autorecovered snapshots’ ?

 

RollbackSnapshotTempDB : why such a name ?

 

The name itself could be slightly misleading in SQL Server context J. It’s not related to any sort of ongoing server transaction rollback, or snapshot (in SQL Server’s “database snapshot” acceptation), or to tempDB system database.

This database is a temporary construct created by SQL Server during backups initiated by VSS framework. So not your usual ‘backup database’ TSQL statement, but instead a backup initiated at system level by NtBackup, Server Backup, Microsoft DPM, or any 3rd party tool relying on VSS.

Furthermore, not every VSS backup will trigger the creation of such a database, only a very specific (and supposedly not mainstream) option will : it’s the request of “autorecovered snapshots”.

· Here’s the link of the option in the VSS API : https://msdn.microsoft.com/en-us/library/windows/desktop/aa385012(v=vs.85).aspx

· And the SQL Writer implementation’s specific can be found here : https://technet.microsoft.com/en-us/library/cc966520.aspx#ECAA (look for ‘Auto-Recovered Snapshots’ entry)

 

So the name must be understood in the VSS context : snapshot relates here to VSS snapshot. TempDB just means temporary database. Rollback is the VSS feature it enables (‘application rollback’, cf. VSS API link, where a read-only, point-in-time state of the structure being backed up is later on made available). The GUID is a uniquifier which derives from the VSS context in which the temporary DB is created.

 

RollbackSnapshotTempDB under the hood.

 

The ‘Guide for SQL Server Backup Application Vendors’ previously linked actually gives us the high level view on the logic taking place which is specific to autorecovered snapshots :

    • Attach the snapshot database to the original SQL Server instance (i.e., the instance to which the original database is attached).
    • Recover the database (this happens as part of the “attach” operation).
    • Shrink log files.
      • (This implies switching the DB to simple recovery in the first place)
    • Detach the database.

This would translate to the following patterns in SQL Server ERRORLOGs :

10:39:19.23 spid301 I/O is frozen on database ABC. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

10:39:19.64 spid301 I/O was resumed on database ABC. No user action is required.

10:39:23.38 Backup Database backed up. Database: ABC, creation date(time): 2009/09/17(20:41:36), pages dumped: 243107, first LSN: 96553:62855:81, last LSN: 96553:62490:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{7B28E39D-7E44-411F-AF80-373C204B95DF}1'}). This is an informational message only. No user action is required.

10:39:23.66 spid301 Starting up database 'RollbackSnapshotTempDB{AF504647-90AC-4354-8BC0-FB6A42FAC1B7}'.

10:39:24.07 spid301 Setting database option RECOVERY to SIMPLE for database RollbackSnapshotTempDB{AF504647-90AC-4354-8BC0-FB6A42FAC1B7}.

10:39:24.30 spid301 Setting database option SINGLE_USER to ON for database RollbackSnapshotTempDB{AF504647-90AC-4354-8BC0-FB6A42FAC1B7}.

 

 

The important thing to understand is that the RollbackSnapshotTempDB is created by attaching Database files which are located in the ongoing active VSS snapshot of the drive(s) hosting the original database files.

So the RollbackSnapshotTempDB database is really a duplicate of the DB being backed up. It will have the same physical file names (mdf ndf and ldf) but the path to those files is not SQL Server data folder, but the snapshot path.

 

Here’s the attach statement :

create database [RollbackSnapshotTempDB{C9934090-EE6D-47B3-8346-FCBF9DFB2FF3}] on (filename = '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy17\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\ABC.mdf'), (filename = '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy17\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\ABC_log.ldf') for attach with presume_abort

 

Notice the ABC_log.ldf but the path starting with \\?\GLOBALROOT . TheRollbackSnapshotTempDB is therefore completely unrelated to the initial (ABC) database as far as SQL Server is concerned (waay down below in OS layers, the RollbackSnapshotTempDB physical files are ‘snapshots’ version of ABC’s relying on copyonwrite mechanism, but again, for SQL Server they’re two distinct DBs). Note that Windows’ File Explorer won’t let you explore the \\?\globalroot path : failure to access it from explorer doesn’t necessarily mean there’s a problem.

 

But why bother with all this overhead?

-Well the purpose is to obtain a database that can be mounted from read-only media so that we can fulfill VSS requirements for the rather specific ‘application rollback’ feature.

This is not possible if the database must be recovered when attached, as this implies write activity against the DB files. Furthermore a snapshot of a busy DB like we just did in steps described above is most likely to yield a DB not in a ‘cleanly shutdown’ state, and therefore candidate to a recovery upon attach.

So before we ‘deliver’ the DB files to the VSS framework, there is extra logic to let the database undergo its recovery and checkpoint, but the only way this can happen is to let SQL Server attach the Database files from the snapshot we just created ! (yes, that might seem a bit counterintuitive J). Note that this requires that the snapshot we’re working with is enabled for Read+Write activity, the VSS layer automatically takes care of this.

As an optional bonus, the logic also shrinks the snapshot DB’s LDF once the recovery has finished (space savings), thanks to the switch to simple recovery model. You will also see an explicit CHECKPOINT passing by to make really sure there’s nothing left for recovery. The single_user mode is there to prepare for the detach operation, to prevent any rogue SPID from blocking the detach, and the DB is finally detached and the VSS backup workflow will continue from there. As you can see in log extract, the whole attach/Detach operation is rather fast (less than one second, but this will vary based on DB recovery duration), so most users/DBA shouldn’t even realize the RollbackSnapshotTempDB<GUID> DB has appeared and disappeared (and yes, this Database would be treated like any other user DB and would appear in Management Studio DB list for the duration of its lifetime, even though it would require a very accurate or lucky refresh operation to catch it).

 

At the end of the day, the difference with the output of a snaphshot backup without autorecovery is that the autorecovered snapshot has been recovered and therefore is cleanly shutdown. This enables the backup application to attach the final files in readonly mode within SQL Server, giving access to a readonly 'snapshot' of the DB.

 

Note that if your backup requestor selects more than one single database for the VSS backup operation, and the option ‘autorecovered snapshots’ is activated, each temporary database will use the same GUID since the VSS context is the same. This will lead to repeated entries for the exact same DB name, but each iteration will actually handle a different user database, so this following pattern is perfectly normal and expected :

 

spid55 I/O is frozen on database ABC. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

spid56 I/O is frozen on database DEF. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.

spid56 I/O was resumed on database DEF. No user action is required.

spid55 I/O was resumed on database ABC. No user action is required.

Backup Database backed up. Database: ABC, creation date(time): 2012/09/20(13:54:10), pages dumped: 10553, first LSN: 48:6825:1, last LSN: 48:6828:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{67780D33-20D5-48B0-8CA4-73F78345CEA9}1'}). This is an informational message only. No user action is required.

Backup Database backed up. Database: DEF, creation date(time): 2012/10/24(13:54:21), pages dumped: 465, first LSN: 37:59:1, last LSN: 37:62:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{67780D33-20D5-48B0-8CA4-73F78345CEA9}2'}). This is an informational message only. No user action is required.

spid55 Starting up database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'. <= Database ABC

spid55 Setting database option RECOVERY to SIMPLE for database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'

spid55 Setting database option SINGLE_USER to ON for database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'.

spid55 Starting up database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'. <= Database DEF

spid55 Setting database option RECOVERY to SIMPLE for database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'.

spid55 Setting database option SINGLE_USER to ON for database 'RollbackSnapshotTempDB{80C5D84B-6CB0-4D5C-B34C-94060F798344}'.

 

 

But I see permanent RollbackSnapshotTempDB on my SQL Server instance ?

 

In an ideal world, DBA’s should not even see any RollbackSnapshotTempDB database, unless they happen to query sysdatabases at the very second a RollbackSnapshotTempDB is activated, or if they audit database attach/detach events. As described, RollbackSnapshotTempDB are transient construct and their lifetime should not exceed a few seconds.

 

However, of late I’ve been working on more than a few occurrences where leftover RollbackSnapshotTempDBdatabases were present on SQL Server instances, and were surviving service restarts and not going away at all.

 

And well, this is bound to grab attention, even if the DBA doesn’t wonder about that new DB in his DBlist, because the leftover database will very likely generate (benign) errors at various levels :

 

  • Right after the “leftover DB” issue takes place, the snapshot still exists, but may not be read/write anymore. Any attempt from SQL to write to that DB will generate the following:

2013-08-29 13:15:21.66 spid52 Error: 823, Severity: 24, State: 2.

2013-08-29 13:15:21.66 spid52 The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x000000000da000 in file '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

2013-08-29 13:15:23.47 spid60 Error: 823, Severity: 24, State: 2.

2013-08-29 13:15:23.47 spid60 The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x000000000da000 in file '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB.mdf'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

  •  This often brings confusion as the user recognizes the physical filename (PUB.mdf') without considering the GLOBALROOT path and may think there is a problem with the original user DB (here PUB).

  • After a SQL Service stops, SQL Server releases all handles against its files, including those within the snapshot, which usually leads to the snapshot’s de-allocation. Therefore upon service restart we see slightly different errors :

2013-08-29 13:18:41.62 spid37s Error: 17204, Severity: 16, State: 1.

2013-08-29 13:18:41.62 spid37s FCB::Open failed: Could not open file \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB.mdf for file number 1. OS error: 3(The system cannot find the path specified.).

2013-08-29 13:18:41.62 spid37s Error: 5120, Severity: 16, State: 101.

2013-08-29 13:18:41.62 spid37s Unable to open the physical file "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB.mdf". Operating system error 3: "3(The system cannot find the path specified.)".

2013-08-29 13:18:41.62 spid37s Error: 17207, Severity: 16, State: 1.

2013-08-29 13:18:41.62 spid37s FileMgr::StartLogFiles: Operating system error 2(The system cannot find the file specified.) occurred while creating or opening file '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB_log.ldf'. Diagnose and correct the operating system error, and retry the operation.

2013-08-29 13:18:41.62 spid37s File activation failure. The physical file name "\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy2\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\PUB_log.ldf" may be incorrect.

  • These errors will repeat at each service start and whenever DB iterations tasks are executed, leading to many entries in errorlogs and event logs
  • It may also prevent some GUI operations in Management studio, with errors like
    •  Invalid object name '#unify_temptbl63513453714256.1'. (Microsoft SQL Server, Error: 208)

 

 

Why do we have leftover RollbackSnapshotTempDB ?

 

There isn’t a single scenario leading to this situation. Basically ‘something’ interrupted the logical flow described above and prevented SQL Writer from detaching the RollbackSnapshotTempDB as a final step.

One must understand that the ‘autorecovered snapshot’ logic belongs to the SQL Writer service, which is more or less ‘stateless’ (a given ongoing SQLWriter backup operation has no knowledge of past or simultaneous VSS backups). On the SQL Server side, the RollbackSnapshotTempDB is just a regular DB (with a slightly exotic filepath, granted). So if anything breaks SQL Writer’s logic and generates a leftover RollbackSnapshotTempDB, SQL Writer won’t do any clean-up upon restart (stateless logic), and SQL Server won’t decide to drop that ‘regular’ DB on its own (I’m sure you are all relieved to know that SQL Server doesn’t drop databases randomly J).

I suspect that any severe event (shutdown, crash, power outage) taking place precisely while a RollbackSnapshotTempDB is active will lead to leftover DB. But that should be a rare enough situation.

I’ve also identified one specific scenario where the replication status of a DB interferes with the autorecovered snapshot. Namely, a published Database which is backed up with the autorecovery option will fail the VSS backup and generate a leftover DB in most cases. But read the last topic of this blog to understand why this should not be a common scenario.

 

Also note that server-wide DDL triggers on operations like create DB (for attach) may interfere with the SQLWriter logic.

 

I have leftover RollbackSnapshotTempDB, What should I do ?

 

Based on what we described so far, you now know that a left-over RollbackSnapshotTempDB is a regular DB pointing to an obsolete snapshot path. It is similar to a DB for which you have deleted the physical files while SQL Server was offline : it’s an empty shell.

By all means delete (drop) any and all leftover RollbackSnapshotTempDBs present on your SQL Server instances (skip readers please make sure to review the whole post to know what ‘leftover’ means !). They will only cause trouble (cf. list of errors above), and there’s nothing to learn from their presence for troubleshooting purposes : SQL Server errorlog has recorded the physical file names along with the activation errors, so that you will know which original user DB was associated to the leftover temporary DB based on those physical names, and this is pretty much all the useful information you could extract a posteriori.

 

If the database that was the source of the RollbackSnapshotTempDB was a replication publisher, you’ll need to put the RollbackSnapshotTempDB database offline first (‘alter database xxx set offline’) to avoid the following message :

Msg 3724, Level 16, State 3, Line 1

Cannot drop the database 'RollbackSnapshotTempDB{E122ECA8-9BFC-451A-81A1-2158B44A8570}' because it is being used for replication

 

The cause of the leftover DB may be a one-off, in which case you’re now good to go. But if you realize they come back on a regular basis, it will be useful to record how they happen to become orphaned.

A SQL profiler of the SQL Server instance recording SQLWriter activity against the SQL Server Instance while the ‘orphaning’ backup is taking place usually is the best way to start (VSS traces won’t help a lot here), along with the SQL errorlogs and OS Events logs covering the same period. So that means that a proactive tracing needs to be activated. I won’t elaborate more here, this is ad hoc troubleshooting which may be a valid reason to involve Microsoft Support Services.

 

Should my backup application really request ‘autorecovered snapshots’ ?

 

As you can see, there’s quite a bit of extra logic involved when this option is enabled. Furthermore, it is apparent that the resulting VSS backup content will not be a ‘strict’ copy of the original database(s) : the backup will contain a database which has been switched to simple recovery model, which transaction log has been shrinked, and that will even remember its ‘RollbackSnapshotTempDB’ logical name upon restore. Noticeable differences indeed. But should that come as a surprise ?

 

Let us go back to our most detailed reference document, https://technet.microsoft.com/en-us/library/cc966520.aspx.

 

Extract :

“With SQL Server 2005 and the VSS framework running on Windows 2003 SP1, it is possible to auto-recover the snapshots as part of the snapshot creation process. As part of the Writer Metadata Document, the SQL writer will specify the component flag “VSS_CF_APP_ROLLBACK_RECOVERY” to indicate that recovery needs to be performed for the database on snapshot before the database can be accessed When specifying the snapshot set, the requestor can indicate that the snapshot should be an app-rollback snapshot (i.e., all database files in a snapshot are meant to be in a consistent state for application usage) or a backup snapshot (a snapshot used for backing up data to be restored later in case of a system failure). The requestor should set VSS_VOLSNAP_ATTR_ROLLBACK_RECOVERY to indicate that this component is being backed up for a non-backup purpose.”

It is very clear that app-rollback snapshot (the ones for which ‘autorecovered snapshot’ option has been set) are not meant for backup purposes, but only for application-rollback usage.

Therefore if your backup utility’s purpose is to generate system-wide backups to protect from disaster scenarios, by capturing a complete system image as it was at the time the backup was taken, it is unlikely that the activation of ‘autorecovered snapshot’ makes sense as far as SQL Server is concerned : the backup won’t be able to help recreate SQL Server databases in their exact original state. The logical data they contain will be the same and available in the same way, though.

 

One example of valid use of this option is Microsoft DPM 2012 new architecture, where the use of autorecovered snapshot against SharePoint Content Databases enables a very optimized method of retrieving single document items (“Item level Recovery”) from DPM backups (see https://technet.microsoft.com/en-us/library/hh758215.aspx). Actually, this is the only valid use of autorecovery I’m aware of J.

 

Hope this helps !

 

-- Guillaume Fourrat
-- SQL Server Escalation Engineer