共用方式為


Tracking database recovery progress using information from DMV

Technorati Tags: Engine,SQL Server 2008

You must be very familiar with the database recovery related messages printed to the SQL Server Error log. These come in very handy when troubleshooting issues that are related to long recovery. These messages provide information about the stage of the recovery process and approximate time for completion.

2010-12-29 12:02:10.43 spid25s Starting up database 'testdb'.
2010-12-29 12:02:31.23 spid25s Recovery of database 'testdb' (11) is 0% complete (approximately 1725 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.94 spid25s Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.95 spid25s Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s 580056 transactions rolled forward in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:32.05 spid25s Recovery of database 'testdb' (11) is 99% complete (approximately 2 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s 1 transactions rolled back in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s Recovery is writing a checkpoint in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.44 spid25s Recovery completed for database testdb (database ID 11) in 244 second(s) (analysis 37849 ms, redo 109038 ms, undo 97146 ms.) This is an informational message only. No user action is required.

Starting with SQL Server 2008, you do not need to repeatedly open/refresh the SQL Server error log or execute the stored procedure sp_readerrorlog to get up to date information about the progress of the database recovery. Most of the information is readily available in the dynamic management views [DMV]. The 2 DMV’s that offer insights into the progress of database recovery are: sys.dm_exec_requests and sys.dm_tran_database_transactions. The information presented in these DMV’s vary depending upon the situation: recovery of databases during server startup, recovery of database after a attach operation, recovery of database after a restore operation.

Here is a view of sys.dm_exec_requests showing recovery related information: [I repeatedly queried this DMV and loaded it into a temporary table for later analysis]

RollBack status information from requests dmv

The key information here is the command type [DB STARTUP] and the session_id that indicates this is a system task performing the startup recovery. The percent_complete shows the same value that the error log messages indicate about the progress within the stage of recovery. You can use the information from other columns [wait information] to understand if the recovery is taking much longer due to an I/O issue or other problems.

Since the redo and undo portions can actually happen on different tasks in fast recovery scenarios in Enterprise Edition, you could actually see 2 different session_id’s for the same database recovery process.

In the case of attaching and restoring the database, this information will be reflected under the same session_id as the user session which executes those commands.

The DMV sys.dm_tran_database_transactions exposes information that can be useful to track the progress of the undo stage of the recovery. Here is a snapshot that corresponds to the above mentioned progress information from sys.dm_exec_requests.

RollBack status information from transactions dmv

The key information here is how the database_transaction_log_bytes_reserved keeps coming down as undo progresses. Also in the case where there are several transactions to undo, you will be able to see their progress using the database_transaction_next_undo_lsn.

Next time you encounter a really long recovery problem, try using some of these information points to understand if the recovery is progressing very slow or there is a lot of work to be done or if it completely stuck.

Suresh B. Kandoth

SQL Server Escalation Services

Other references:

https://blogs.msdn.com/b/psssql/archive/2008/01/23/how-it-works-what-is-restore-backup-doing.aspx

https://blogs.msdn.com/b/psssql/archive/2009/05/21/how-a-log-file-structure-can-affect-database-recovery-time.aspx

Here are some sample scripts I used to gather and report out the information I showed earlier:

-- create the tables to store the information you collect about recovery

USE <db_name>

GO

DROP TABLE [dbo].[tbl_recovery_tracking]

GO

DROP TABLE [dbo].[tbl_dm_tran_database_transactions]

GO

CREATE TABLE [dbo].[tbl_recovery_tracking](

      [runtime] [datetime] NOT NULL,

      [command] [nvarchar](256) NOT NULL,

      [session_id] [smallint] NOT NULL,

      [database_id] [smallint] NOT NULL,

      [total_elapsed_time] [int] NOT NULL,

      [percent_complete] [real] NOT NULL,

      [estimated_completion_time] [bigint] NOT NULL,

      [wait_resource] [nvarchar](256) NOT NULL,

      [wait_time] [int] NOT NULL,

      [wait_type] [nvarchar](60) NULL,

      [blocking_session_id] [smallint] NULL,

      [reads] [bigint] NOT NULL,

      [writes] [bigint] NOT NULL,

      [cpu_time] [int] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[tbl_dm_tran_database_transactions](

      [runtime] [datetime] NOT NULL,

      [transaction_id] [bigint] NOT NULL,

      [database_id] [int] NOT NULL,

      [database_transaction_log_record_count] [bigint] NOT NULL,

      [database_transaction_log_bytes_used] [bigint] NOT NULL,

      [database_transaction_log_bytes_reserved] [bigint] NOT NULL,

      [database_transaction_next_undo_lsn] [numeric](25, 0) NULL

) ON [PRIMARY]

GO

 

-- collect the information in a loop

WHILE 1 = 1

BEGIN

      INSERT INTO [dbo].[tbl_recovery_tracking]

      SELECT GETDATE() as runtime, command,

      session_id, database_id, total_elapsed_time,

      percent_complete, estimated_completion_time,

      wait_resource, wait_time, wait_type, blocking_session_id,

      reads, writes, cpu_time

      FROM sys.dm_exec_requests

      WHERE command = 'DB STARTUP' -- may need to change this if troubleshooting recovery as part of attach database or restore

 

      INSERT INTO tbl_dm_tran_database_transactions

      SELECT GETDATE() as runtime,

      transaction_id, database_id,

      database_transaction_log_record_count, database_transaction_log_bytes_used,

      database_transaction_log_bytes_reserved, database_transaction_next_undo_lsn

      FROM sys.dm_tran_database_transactions

     

      WAITFOR DELAY '00:00:01' -- change this capture interval

END

GO

-- after you collect information for some time, you can analyze the information to understand the progress of recovery

SELECT runtime, command,

      session_id, database_id, total_elapsed_time,

      percent_complete, estimated_completion_time,

      wait_resource, wait_time, wait_type, blocking_session_id,

      reads, writes, cpu_time

FROM [dbo].[tbl_recovery_tracking]

WHERE session_id = 25 -- change this

ORDER BY runtime

GO

SELECT

runtime, transaction_id, database_id,

database_transaction_log_record_count,

database_transaction_log_bytes_used, database_transaction_log_bytes_reserved,

database_transaction_next_undo_lsn

FROM tbl_dm_tran_database_transactions

WHERE database_id = 11 and transaction_id = 1452239 -- change this

ORDER BY runtime

GO

Comments

  • Anonymous
    December 29, 2010
    Nice post Suresh, With the use of these DMV's, troubleshooting slow recovery/startup issues will become a whole lot easier. Thanks! VarunD

  • Anonymous
    January 04, 2011
    Hi, Nice article. You can discover a lot more about improving SQL performance via DMVs in this forthcoming book www.manning.com/stirk. It contains more than 100 scripts to identify problems, and offers a wide range of solutions. Chapter 1 can be downloaded for free and includes scripts for: A simple monitor Finding your slowest queries Find your missing indexes Identifying what SQL is running now Quickly find a cached plan Thanks Ian  

  • Anonymous
    January 11, 2011
    hi, could you please explain me lamda expresion properly and use of lamda exprasion.

  • Anonymous
    June 06, 2012
    Suresh: Thanks Much.   Hope your candle never dims.

  • Daniel
  • Anonymous
    February 19, 2016
    Great information. Thanks!Had to shutdown during phase 3 recovery 86% complete and then startup again.Then I too had the situation where the log was not showing me anything (seemed to complete at phase 2 with 99% and checkdb completed, no phase 3 show this time). But i couldn't change the database properties. Then sp_who2 shows DB STARTUP with progressing I/O.Looking at the information you present here i was able to see more. Still not too much more though but probably about as much as i am going to get.
  • Anonymous
    October 12, 2017
    In real terms, I used this DMV but failed miserably.