แชร์ผ่าน


When doing a database restore, what actions is the engine taking after it gets past the 100%?

This was a question my admired Greg Low posted on the SQL Server MCMs discussion group a few days back, while he was restoring a 2TB database and noticed that the percent_complete column from sys.dm_exec_requests for the session from which the RESTORE statement was running, reached 100% and thirty minutes later, the restore hadn’t finished executing and was still indicating it was 100% completed.

To investigate what was occurring and be able to explain it to others, I  created a very simple repro to diagnose what was not included in what the restore task estimates to be the 100% of work it has to complete and how the progress gets reported. The test scenario consisted of the following steps:

1) From one session in SSMS (or any other SQL client tool for that matter), I execute the following script and leave the transaction open for now…

create database Greg

go

use Greg

go

create table t1 (c1 char(8000))

go

set nocount on

go

insert into t1 values (REPLICATE('X', 8000))

go 1000

go

begin tran

insert into t1 values (REPLICATE('X', 8000))

go 1000

delete t1

go

set nocount off

go

--use master

--go

--rollback

 

2) From a second session, I execute the BACKUP DATABASE statement like this:

 

backup database Greg to disk = 'c:\bak\Greg.bak' with format

 

3) Once the backup completes, I rollback the transaction which I left open in the first session and then restore the database on top of itself, from the same session from which I dumped the database:

use master

go

restore database Greg from disk = 'c:\bak\Greg.bak' with replace

 

Now, having trace flags 3605 & 3004 globally enabled before I execute the RESTORE statement, this is what I see in the ERRORLOG during the execution of the command. Following information, I copied from my ERROLOG and annotated (and highlighted) those points which are critical to understand it:

2014-06-03 22:03:34.270 spid51 RestoreDatabase: Database Greg

2014-06-03 22:03:34.270 spid51 FGState(1): Normal

2014-06-03 22:03:34.270 spid51 Opening backup set

2014-06-03 22:03:34.280 spid51 Input archive of type: D. Restore Syntax: RESTORE DATABASE

2014-06-03 22:03:34.280 spid51 Effective restore command: RestoreFull

2014-06-03 22:03:34.280 spid51 SetTargetRestoreAge: 0

2014-06-03 22:03:34.290 spid51 Restore: Configuration section loaded

2014-06-03 22:03:34.290 spid51 Restore: Backup set is open

2014-06-03 22:03:34.290 spid51 Restore: Planning begins

2014-06-03 22:03:34.290 spid51 RestoreContainer(1): New state is Restoring. With HDR is Restoring

2014-06-03 22:03:34.290 spid51 RestoreContainer(2): New state is Restoring. With HDR is Restoring

2014-06-03 22:03:34.310 spid51 RestoreVolumeList: Avail: 11908MB. Req: 0MB. Name=C:\

2014-06-03 22:03:34.310 spid51 Halting FullText crawls on database Greg

2014-06-03 22:03:34.310 spid51 Dismounting FullText catalogs

2014-06-03 22:03:34.310 spid51 X-locking database: Greg

2014-06-03 22:03:34.310 spid51 Got X-lock

2014-06-03 22:03:34.310 spid51 Restore: Planning complete

2014-06-03 22:03:34.310 spid51 Restore: BeginRestore (offline) on Greg

2014-06-03 22:03:34.590 spid51 Restore: PreparingContainers

2014-06-03 22:03:34.590 spid51 RestoreContainer: Deleting SAFL: FID=1

2014-06-03 22:03:34.590 spid51 RestoreContainer: Deleting SAFL: FID=2

2014-06-03 22:03:34.590 spid51 RestoreContainer(1): New state is Restoring. With HDR is Restoring

2014-06-03 22:03:34.590 spid51 RestoreContainer(2): New state is Restoring. With HDR is Restoring

2014-06-03 22:03:34.600 spid51 BeginRestoreOnFCB FID=1

2014-06-03 22:03:34.610 spid51 BeginRestoreOnFCB FID=2

2014-06-03 22:03:34.620 spid51 Restore: Containers are ready

2014-06-03 22:03:34.620 spid51 Zeroing C:\Program Files (x86)\Microsoft SQL Server\MSSQL11.DENALIX86\MSSQL\DATA\Greg_log.ldf from page 1 to 6048 (0x2000 to 0x2f40000)

2014-06-03 22:03:34.660 spid51 Restore: Restoring backup set

2014-06-03 22:03:34.660 spid51 Restore: Transferring data to Greg <-- Just at this point is where the amount of work to complete is estimated. So, up to this point, any query on sys.dm_exec_requests for session_id = 51, would show percent_complete as 0 (zero percent.) From this point, the copy data machinery reads data from the pipeline and processes that data. That data initially comes from the MTF_MSDA descriptor block which encapsulates the data file stream in the backupset.)

2014-06-03 22:03:34.660 spid51 FileHandleCache: Initial CacheSize: 12

2014-06-03 22:03:34.880 spid51 Zeroing completed on C:\Program Files (x86)\Microsoft SQL Server\MSSQL11.DENALIX86\MSSQL\DATA\Greg_log.ldf

2014-06-03 22:03:34.880 spid51 Restore: Waiting for log zero on Greg

2014-06-03 22:03:34.890 spid51 Restore: LogZero complete <-- In my example, at this point 42.40605% is completed. Having consumed the data stream and zero initialized the log file, it processes the data read from the MTF_MSTL descriptor block which encapsulates the transaction log stream in the backupset.

2014-06-03 22:03:35.290 spid51 SetTargetRestoreAge: 0 <-- At this point, it is already 100% completed. So it took 1,020ms to get to a hundred percent completion from, roughly, the beginning of the RESTORE statement execution.

2014-06-03 22:03:35.300 spid51 FileHandleCache: 0 files opened. CacheSize: 12

2014-06-03 22:03:35.300 spid51 Restore: Data transfer complete on Greg

2014-06-03 22:03:35.320 spid51 Restore: Backup set restored

2014-06-03 22:03:35.340 spid51 Starting up database 'Greg'.

2014-06-03 22:03:35.370 spid51 The database 'Greg' is marked RESTORING and is in a state that does not allow recovery to be run.

2014-06-03 22:03:35.390 spid51 Restore-Redo begins on database Greg

2014-06-03 22:03:35.390 spid51 SelectedForMediaRedo: FID=1 LN=Greg

2014-06-03 22:03:35.390 spid51 SelectedForMediaRedo: FID=2 LN=Greg_log

2014-06-03 22:03:35.590 spid51 RunOfflineRedo logIter.Init(): FirstLsn(PruId: 0): 0x44:0x232:0x1

2014-06-03 22:03:35.590 spid51 RunOfflineRedo logIter.Init(): LastLsn(PruId: 0): 0x8c:0x285:0x1

2014-06-03 22:03:35.740 spid51 OfflineRollforward: StopLsn/LastLsn(PruId: 0): 0x8c:0x285:0x1

2014-06-03 22:03:35.750 spid51 Rollforward complete on database Greg

2014-06-03 22:03:35.770 spid51 Restore: Done with fixups

2014-06-03 22:03:35.780 spid51 Restore: Transitioning database to ONLINE

2014-06-03 22:03:35.780 spid51 RestoreContainer(1): New state is Normal. With HDR is ReadyToRollforward

2014-06-03 22:03:35.780 spid51 RestoreContainer(2): New state is Normal. With HDR is ReadyToRollforward

2014-06-03 22:03:35.790 spid51 Restore: Restarting database for ONLINE

2014-06-03 22:03:35.880 spid51 Starting up database 'Greg'.

2014-06-03 22:03:36.280 spid51 FixupLogTail(progress) zeroing C:\Program Files (x86)\Microsoft SQL Server\MSSQL11.DENALIX86\MSSQL\DATA\Greg_log.ldf from 0x1990a00 to 0x1992000.

2014-06-03 22:03:36.280 spid51 Zeroing C:\Program Files (x86)\Microsoft SQL Server\MSSQL11.DENALIX86\MSSQL\DATA\Greg_log.ldf from page 3273 to 3304 (0x1992000 to 0x19d0000)

2014-06-03 22:03:36.280 spid51 Zeroing completed on C:\Program Files (x86)\Microsoft SQL Server\MSSQL11.DENALIX86\MSSQL\DATA\Greg_log.ldf

2014-06-03 22:03:36.630 spid51 Recovery is writing a checkpoint in database 'Greg' (19). This is an informational message only. No user action is required.

2014-06-03 22:03:36.790 spid51 PostRestoreContainerFixups: fixups complete

2014-06-03 22:03:36.800 spid51 PostRestoreReplicationFixup for Greg starts

2014-06-03 22:03:38.570 spid51 PostRestoreReplicationFixup for Greg complete

2014-06-03 22:03:38.570 spid51 Restore: Database is restarted

2014-06-03 22:03:38.570 Backup Restore is complete on database 'Greg'. The database is now available.

2014-06-03 22:03:38.580 spid51 Resuming any halted fulltext crawls

2014-06-03 22:03:38.590 spid51 Restore: Writing history records

2014-06-03 22:03:38.590 Backup Database was restored: Database: Greg, creation date(time): 2014/05/30(15:24:35), first LSN: 68:562:1, last LSN: 140:645:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'c:\bak\Greg.bak'}). Informational message. No user action required.

2014-06-03 22:03:38.590 spid51 Writing backup history records

2014-06-03 22:03:38.930 spid51 Restore: Done with MSDB maintenance

2014-06-03 22:03:38.930 spid51 RestoreDatabase: Finished <-- At this point, 3,640ms have elapsed since the completed percent was reporting 100%. And it only took 1,020ms to get from 0% to 100%. That is, in this super simple example over 75% of the total time elapsed, the progress tracking was reporting it had completed already.

 

Mileage would vary on every case, but now you can find for yourself from the output reported to ERRORLOG by RESTORE statements, when those two trace flags are enabled, those two milestones where 1) the total amount of work to complete is estimated (Restore: Transferring data to <Your_Database>), and 2) the point at which the percentage of completed work has already reached 100% (SetTargetRestoreAge: #)

With that information, you can typically figure it out by yourself, where exactly is it that the remaining time to bring the database fully restored is being spent.

You may happen to be as curious as David Barbarin is, and consider the previous response partially complete unless you are explained what activities are included in the PostRestoreContainerFixups and  PostRestoreReplicationFixup operations… :-)

If that’s the case, I can tell you that PostRestoreContainerFixups resolves the post-restore configuration of sysaltfiles/sysdbfiles, while PostRestoreReplicationFixup invokes the sp (dbo.sp_restoredbreplication) which fixes up replication status as appropriate post-restore, and the sp (dbo.sp_cdc_restoredb) which cleanses up the CDC related metadata.

 

You may also wonder which, from all the phases logged there, do not occur when the database is restored using the WITH NO_RECOVERY clause. And the following screenshot tries to address just that case.

image

Hope it’s clear and helps you understand that, at least in the case of RESTOREs, the value expressed through the percent_complete column is just based on a rough estimation from a very specific phase of the restore process which is the copy data phase, during which data file and transaction log streams are read from the backupset and processed (ie, laid down over the storage specified.)

Comments

  • Anonymous
    July 01, 2014
    I have meet a few of problems like this, a table have a billion rows without patition, one index on this table casue so many dirty pages in buffer pool by inserting, whatever I do backup log or restore database or the other operation like 'ALTER DATABASE', it take a long time.