Udostępnij za pośrednictwem


Backing up a corrupted SQL Server database

I had a question about how to do a backup and skip a corrupted block of data. First, DO NOT DO IT unless you absolutely have to, such as when you are taking a backup prior to trying to fix the corruption (which means you should be on the phone with Microsoft PSS). If you do skip corrupted data you have to consider the backup to be very suspect.

 

Do not ever ignore any indication of data inconsistency in the database. If you have corrupted data it is almost certainly a problem caused by something below the SQL Server level. If it happened once, chances are it will happen again... and again.... and again until the source of the problem is fixed. This means the instant you have any indication of a corrupt SQL Server database you should immediately ask for low-level hardware diagnostics and a thorough review of all logs (event viewer, SQL, hardware, etc.). Double check that if write caching is enabled on the hardware that it is battery backed and the battery is healthy. Double check that all firmware is up to date. Run a DBCC CHECKDB WITH ALL_ERRORMSGS and pay very close attention to the output. Find the source of your corruption and fix it.

 

There is a parameter CONTINUE_AFTER_ERROR for BACKUP and RESTORE, but it is a last ditch command that should only be used as a last resort. One example would be if it's the only way to get a backup before you attempt to repair the corruption. It does not always work, it depends on what the error is. If you actually have to restore a database backup taken with this option, then you MUST fix the corruption before allowing users, applications, or other production processes back into the database. From BOL:

"We strongly recommend that you reserve using the CONTINUE_AFTER_ERROR option until you have exhausted all alternatives."

"At the end of a restore sequence that continues despite errors, you may be able to repair the database with DBCC CHECKDB. For CHECKDB to run most consistently after using RESTORE CONTINUE_AFTER_ERROR, we recommend that you use the WITH TABLOCK option in your DBCC CHECKDB command."

"Use NO_TRUNCATE or CONTINUE_AFTER_ERROR only if you are backing up the tail of a damaged database."

 

Some suggestions:

· For every 2005/2008 database, SET PAGE_VERIFY=CHECKSUM (in 2005 this cannot be turned on for TempDB, but it can be turned on for TempDB in 2008). For SQL Server 2000 set TORN_PAGE_DETECTION=ON. When upgrading from 2000 to newer versions set TORN_PAGE_DETECTION=OFF and SET PAGE_VERIFY=CHECKSUM.

· For databases with CHECKSUM enabled, use the WITH CHECKSUM command on all backups.

· Implement a "standards" or "best practices" document to handle corruption on each version of SQL Server.

· Review your disaster recovery plans and upcoming testing. Testing of a full recovery of various scenarios should be done periodically. Some people think once a year is enough, others say monthly or quarterly is often enough. Having backups is not good enough, we have to know that they can be restored. There are also scenarios where backups are not the best way to recover from a problem.

 

Some great info from the person who wrote CHECKDB:

https://sqlskills.com/blogs/paul/post/Example-20002005-corrupt-databases-and-some-more-info-on-backup-restore-page-checksums-and-IO-errors.aspx

https://sqlskills.com/BLOGS/PAUL/category/Corruption.aspx