Internals of Database Snapshot – Part 2: Snapshot and the Buffer Pool
In my earlier post, we have seen what Database Snapshots are, and how the Database Snapshots are maintained by the SQL Server Database Engine. In this post we will see how the data and index pages, belonging to the Snapshot, are managed in the Buffer Pool.
Before we start drilling down further, let me take a while to explain how we will pull up details about pages in the Buffer Pool. We have a DMV and a few Catalog Views that we can use:
- sys.dm_os_buffer_descriptors: This DMV returns information about all the data pages that are currently in the SQL Server buffer pool. When a data page is read from disk, the page is copied into the SQL Server buffer pool and cached for reuse. Each cached data page has one buffer descriptor. Buffer descriptors uniquely identify each data page that is currently cached in an instance of SQL Server. sys.dm_os_buffer_descriptors returns cached pages for all user and system databases. This includes pages that are associated with the Resource database.
- sys.allocation_units: Contains a row for each allocation unit in the database. sys.dm_os_buffer_descriptors can be joined with sys.allocation_units based on the allocation_unit_id column values.
- sys.partitions: Contains a row for each partition of all the tables and most types of indexes in the database. Special index types like Fulltext, Spatial, and XML are not included in this view. All tables and indexes in SQL Server 2008 contain at least one partition, whether or not they are explicitly partitioned. The container_id column of sys.allocation_units can be joined with the hobt_id column of sys.partitions when the allocation type in sys.allocation_units is either 1 (In-row data) or 3 (Row-overflow data). When allocation type in sys.allocation_units is 3 (Large object (LOB) data (text, ntext, image, xml, large value types, and CLR user-defined types)), container_id column of sys.allocation_units can be joined with the partition_id column of sys.partitions.
Joining these, we can construct the following query that can give us the Count of Pages in the Buffer Pool on a Per Object and Page Type basis:
/*
This code has to be run under the context of the Database as
in Line 2. For example, if you want to get the details of the
database "MyDB", change Line 2 to DB_ID('MyDB'), and execute
this under the context of MyDB.
*/
DECLARE @DBID INT
SELECT @DBID = DB_ID('Database_Name')
SELECT
M.[object_name] [Object Name]
, M.[page_type] [Page Type]
, COUNT(M.[page_type]) AS [Page_Count]
FROM
(
SELECT
OSBD.[database_id]
, OSBD.[allocation_unit_id]
, OSBD.[page_type]
, AU.[container_id]
, P.[object_id]
, S.[name] + '.' + O.[name] AS [object_name]
FROM sys.dm_os_buffer_descriptors OSBD
JOIN sys.allocation_units AU
ON OSBD.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions P
ON AU.[container_id] = P.[hobt_id]
JOIN sys.objects O
ON P.[object_id] = O.[object_id]
JOIN sys.schemas S
ON O.[schema_id] = S.[schema_id]
WHERE OSBD.database_id = @DBID
AND AU.[type] IN (1, 3)
UNION ALL
SELECT
OSBD.[database_id]
, OSBD.[allocation_unit_id]
, OSBD.[page_type]
, AU.[container_id]
, P.[object_id]
, S.[name] + '.' + O.[name]
FROM sys.dm_os_buffer_descriptors OSBD
JOIN sys.allocation_units AU
ON OSBD.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions P
ON AU.[container_id] = P.[partition_id]
JOIN sys.objects O
ON P.[object_id] = O.[object_id]
JOIN sys.schemas S
ON O.[schema_id] = S.[schema_id]
WHERE OSBD.database_id = @DBID
AND AU.[type] IN (2)
) AS M
GROUP BY M.[object_name], M.[page_type]
ORDER BY 1
Let us now see what happens when we go ahead and create a Snapshot of a database. Before we create the Snapshot, we will clear off the Buffer Pool using the DBCC DROPCLEANBUFFERS command.
Note: After running the DBCC DROPCLEANBUFFERS command, make sure you execute the query mentioned above to ensure that the Buffer Pool is completely empty of User Objects. Few pages of system tables may continue to exist in the Buffer Pool, and this is fine.
After we create the Database Snapshot, if we execute the query above, on both the Source Database and the Snapshot, we will see that the Buffer Pool contains NO user data or index pages. All pages for these databases will belong to the system tables, which indicates that some of the metadata is transformed to the Snapshot.
Note: In case you have defined Full Text Indexes on your tables, this will not hold good – the query against the Source Database will show some pages belonging to some of the User Tables.
We will now query ONE row from one of the User Tables in the Snapshot. The table I am using is a heap with no indexes defined on it for simplicity, to avoid taking into consideration the Index Pages. You can try querying tables with Indexes (and even use complex queries so that multiple tables and indexes are touched) to observe the behavior.
The same query (with the database name in Line 2 changed), when run against the Snapshot database, shows that the data pages traversed by the query are indeed cached in the Buffer Pool under the context of the Snapshot Database.
Remember: Databases do not share pages in the Buffer Pool.
Now, let us see what happens when we modify a row in the Source Database. We will again use the Table with No Indexes for simplicity, and before we execute the update statement, we will clean up the Buffer Pool using the DBCC DROPCLEANBUFFERS command.
We will do another experiment this time. We will modify a page in the Source Database and read the same page from the Snapshot. Later, we will check the contents in the Buffer Pool for this one object for both databases. For this purpose, we will use a slightly different query, as pasted below:
/*
This code has to be run under the context of the Database as
in Line 2. For example, if you want to get the details of the
database "MyDB", change Line 2 to DB_ID('MyDB'), and execute
this under the context of MyDB.
*/
DECLARE @DBID INT
SELECT @DBID = DB_ID('Database_Name')
SELECT
BP.[Database Name]
, BP.[Object Name]
, BP.[File ID]
, BP.[Page ID]
, BP.[Page Type]
, BP.[Number of Rows]
, BP.[Is Modified]
FROM
(
SELECT
DB_NAME(OSBD.[database_id]) [Database Name]
, S.[name] + '.' + O.[name] [Object Name]
, OSBD.[file_id] [File ID]
, OSBD.[page_id] [Page ID]
, OSBD.[page_type] [Page Type]
, OSBD.[row_count] [Number of Rows]
, OSBD.[is_modified] [Is Modified]
FROM sys.dm_os_buffer_descriptors OSBD
JOIN sys.allocation_units AU
ON OSBD.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions P
ON AU.[container_id] = P.[hobt_id]
JOIN sys.objects O
ON P.[object_id] = O.[object_id]
JOIN sys.schemas S
ON O.[schema_id] = S.[schema_id]
WHERE OSBD.database_id = @DBID
AND AU.[type] IN (1, 3)
UNION ALL
SELECT
DB_NAME(OSBD.[database_id]) [Database Name]
, S.[name] + '.' + O.[name] [Object Name]
, OSBD.[file_id] [File ID]
, OSBD.[page_id] [Page ID]
, OSBD.[page_type] [Page Type]
, OSBD.[row_count] [Number of Rows]
, OSBD.[is_modified] [Is Modified]
FROM sys.dm_os_buffer_descriptors OSBD
JOIN sys.allocation_units AU
ON OSBD.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions P
ON AU.[container_id] = P.[partition_id]
JOIN sys.objects O
ON P.[object_id] = O.[object_id]
JOIN sys.schemas S
ON O.[schema_id] = S.[schema_id]
WHERE OSBD.database_id = @DBID
AND AU.[type] IN (2)
) AS BP
WHERE BP.[Object Name] = 'SchemaName.TableName'
ORDER BY BP.[File ID], BP.[Page ID]
When we analyse the contents of the Buffer Pool, we will see that the modified page has been marked as Modified in the Source Database, and marked as Clean in the Snapshot (screenshots below). This conclusively proves that pages in the Buffer Pool are not shared and that different states of the same page might be present in the Buffer Pool depending on when the Snapshot was created and when the page was modified.
Note: The page marked as Modified will not be removed from the Buffer Pool until Checkpoint runs on the database and flushes the dirty page to the disk. Once the dirty page is flushed, the page is marked as Clean (the Is_Modified bit is cleared), and then this page can be removed from the Buffer Pool by using the DBCC DROPCLEANBUFFERS command.
Summary: Databases do not share pages (memory) in the Buffer Pool. Each database (Source Database and the Snapshot Database) caches its own pages in the Buffer Pool depending on the data pages being read. Hence, if you are planning to create multiple snapshots of the same database, apart from considering the impact due to increased IO (as described in my earlier post), you will also need to consider the increased Buffer Pool usage, and impact of it on performance.
In the next post, we will see what happens internally when we create Snapshots on Mirror databases.
Disclaimer: All information provided here is my personal opinion and is neither verified nor approved by Microsoft before it is published. All information, and code samples, if any, is provided "AS IS" with no warranties and confers no rights.