Backups fail due to consistency check failure…
Last week I had the opportunity to work with a customer who was experiencing issues backing up their Exchange 2010 databases. The issue they experienced though is relevant to both Exchange 2007 and Exchange 2003 installations (that leverage VSS based backups and consistency checking enabled).
After reviewing the logs it was apparent that the VSS process was functioning appropriately. All relevant events regarding the snapshot process were present. In this case the backup job was configured for consistency check, and relevant consistency check events were noted. In almost all backup jobs the following error was present in the logs:
Log Name: Application
Source: Storage Group Consistency Check
Event ID: 403
Task Category: Termination
Level: Error
Keywords: Classic
Description:
Instance: The physical consistency check successfully validated 0 out of xxxxxxxx pages of database 'DATABASE'. Because some database pages were either not validated or failed validation, the consistency check has been considered unsuccessful.
In general this event would indicate that consistency check encountered an error when scanning the pages of an Exchange database. In most cases this would mean that there is page level corruption in the database such that the validation checks performed by consistency check would fail and the backup would be terminated. This is by design.
In theory corruption of this type would not be present in the environment configured. The customer was utilizing a Database Availability Group which has protections in it to self heal databases from this type of corruption. Replication was healthy and there were no indication that any page corrections were performed.
If you look at the event in greater detail you will see that it provides the number of pages that were successfully scanned before the issue occurred. When reviewing the application logs it was noted that on the same database the failure occurred after scanning a different number of pages. For example, in one failure the failure occurred after scanning 28000 pages and another failure 42456 pages.
At this point when reviewing the system log the following error was noted:
Time: 1/9/2012 12:40:56 PM
ID: 36
Level: Error
Source: volsnap
Machine: server.company.com
Message: The shadow copies of volume F: were aborted because the shadow copy storage could not grow due to a user imposed limit.
This error would imply that while attempting to store differential changes while the snapshot existed the allotted snapshot storage space was exhausted and could not be grown. When reviewing vssadmin list shadowstorage it was noted that the shadow storage space assigned to the volume hosting the database was 321 megabytes.
vssadmin list shadowstorage
Shadow Copy Storage association
For volume: (F:)\\?\Volume{0ecc7a68-be78-4c40-baf6-4d0d3b0b6693}\
Shadow Copy Storage volume: (H:)\\?\Volume{ed074b1d-b500-465b-a720-d2f733f49761}\
Used Shadow Copy Storage space: 0 B (0%)
Allocated Shadow Copy Storage space: 0 B (0%)
Maximum Shadow Copy Storage space: 321 MB (0%)
This is an extremely small shadow copy storage space. By default the allotted space is generally 10% of volume size. To correct this issue we can utilize the vssadmin command in order to reset the shadow storage space.
vssadmin Resize ShadowStorage /For=F: /On=F: /maxsize=20%
Successfully resized the shadow copy storage association
In our case the in-ability to continue to store differential changes in the shadow storage space caused the shadow copy to be removed. This subsequently caused consistency check to fail resulting in a failure of the backup job. Once the shadow copy storage was was allocated to an appropriate size, and differential changes could be successfully stored for the entire duration of the backup operation, the backups proceeded successfully.
Comments
Anonymous
January 01, 2003
Tim, firstly thanks a lot for your explanation of this issue. I have reproduced this issue in my env.But I encounter the volsnap 33 event from the system log as belowThe oldest shadow copy of volume D: was deleted to keep disk space usage for shadow copies of volume D: below the user defined limit.Bty, I am using the Exchange 2013 + Netbackup 7.6 to reproduce this issue.Anonymous
January 01, 2003
But I am not very know what you said:In our case the in-ability to continue to store differential changes in the shadow storage space caused the shadow copy to be removed. This subsequently caused consistency check to fail resulting in a failure of the backup job.I am not familiar with the Exchange writers + VSS process during the backup operation.Could you please show me more about the entire transaction flow of the Exchange writers + VSS activities while the backup operation such as Netbackup, TSM request the show copy service being created.Anyway, thanks a lot for your help in advance!Anonymous
January 01, 2003
@Ed Wilkinson
Send me the case number via the blog contact. The type of corruption here should be fixed by the day assuming it is a database reporting it. If its a log file just run the backup without consistency check and then.immediately follow with another full backup with consistency check.
either way...do not ignore re the fact these errors.point to hardware level corruption.
timmcmicAnonymous
January 01, 2003
@Anonymous... That would imply that there was an existing snapshot on the volume or maybe a previously orphaned snapshot that could be removed. In this case there exists only one snapshot - the snapshot that is in progress - and it cannot be expanded due to reaching the user defined limit. TIMMCMICAnonymous
April 08, 2012
Very useful information , thanks Tim..Anonymous
November 07, 2013
Great Work!Anonymous
November 08, 2013
@Karthik: I'm glad you found it useful. TIMMCMICAnonymous
November 08, 2013
@Mani.. Thanks TIMMCMICAnonymous
October 28, 2014
Backups fail due to consistency check failure… - Tim McMichael - Site Home - TechNet BlogsAnonymous
November 21, 2014
Tim we have a simliar issue with 403 event ids while running Full backups with db checks on the passive copies in EX 2010. Our Maximum Shadow Copy Storage space is curretly set to UNBOUNDED. We were told by our backup vendor to reseed the passive dbs on the server that is failing. We tried one last night and the issue was resolved on that DB. Is the only fix to reseed 7 1TB dbs? Any ideas would be helpful. PS we do have a ticket up with MS Support.