How and when are the pages pushed down to the sparse files of a Database Snapshot?
Somebody recently asked me the following: “I’ve been told that when a hashed page from the BPool changes it is not immediately written to the data file on disk. Only the operations that caused that modification to the page are guaranteed to be written to the TLog before the transaction commits. The data page will only be written down to disk because of a checkpoint. But what happens if the database has one or more DB Snapshots associated with it? Is the main database’s affected page synchronously written to disk? Are the copies of that page that had to be replicated to the sparse files of the different snapshots written synchronously to disk as part of the main database’s affected page modification?”
Well, the way it works is that when the contents of the main database’s targeted page is about to be changed (say by an UPDATE stmt), it checks whether that database has any associated DB Snapshots to which this page hasn’t been copied over yet. If that is the case, a copy of the page must made it to the sparse file of every snapshot requiring the page, BEFORE the statement that modified main database’s page can continue. If a given page needs to be replicated to more than one snapshot, the storage engine tries to parallelize them. Certain conditions may prevent the different writes to occur in parallel and in that case, all write should complete one after another.
If any of these COW operations fail, the DB Snapshot it was targeting is marked SUSPECT and not maintained any longer.
When the error occurs, an informational message is sent over to the client which issued the statement encountering the error. Something like this:
The operating system returned error 2(The system cannot find the file specified.) to SQL Server during a write at offset 0x00000000010000 in file 'd:\SecondaryDataFile.ss'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Plus, the following sequence of messages are dumped down to ERRORLOG:
2010-05-09 19:35:20.580 spid52 The operating system returned error 2(The system cannot find the file specified.) to SQL Server during a write at offset 0x00000000010000 in file 'd:\SecondaryDataFile.ss'. Additional messages in the SQL Server error log and system event log may provide mo
2010-05-09 19:40:12.560 spid52 Error: 17053, Severity: 16, State: 1.
2010-05-09 19:40:12.560 spid52 d:\SecondaryDataFile.ss: Operating system error 2(The system cannot find the file specified.) encountered.
2010-05-09 22:07:31.300 spid52 Error: 3420, Severity: 21, State: 1.
2010-05-09 22:07:31.300 spid52 Database snapshot 'TestDB_Snapshot' has failed an IO operation and is marked suspect. It must be dropped and recreated.
The parts of the message marked in green may vary depending on the specific error reported by the OS which caused WriteFile to fail.
Thanks,
Nacho