Changes in SQL Server 2016 Checkpoint Behavior

Reviewed by: Denzil Ribeiro, Mike Weiner, Arvind Shyamsundar, Sanjay Mishra, Murshed Zaman, Peter Byrne, Purvi Shah

SQL Server 2016 introduces changes to the default behavior of checkpoint. In a recent customer engagement, we found the behavior change to result in higher disk (write) queues on SQL Server 2016 vs. the same workload on SQL Server 2012. In this blog we’ll describe the changes, options are available to control these and what impact they might have on workloads that are upgrading to SQL Server 2016. In this specific case changing the database to use the new default behavior of checkpoint proved to be very beneficial.

Checkpoints in SQL Server are the process by which the database engine writes modified data pages to data files. Starting with SQL Server 2012 more options have been provided to better control how checkpoint behaves, specifically indirect checkpoint. The default checkpoint behavior in SQL Server prior to 2016 is to run automatic checkpoints when the log records reach the number of records the database engine estimates it can process within the “recovery interval” (server configuration option). When an automatic checkpoint occurs the database engine flushes the modified data pages in a burst fashion to disk. Indirect checkpoint provides the ability to set a target recovery time for a database (in seconds). When enabled, indirect checkpoint results in constant background writes of modified data pages vs. periodic flushes of modified pages. The use of indirect checkpoint can result in “smoothing” out the writes and lessoning the impact short periodic bursts of flushes have on other I/O operations.

In addition to configuring indirect checkpoint SQL Server also exposes the ability to utilize a startup parameter (-k) followed by a decimal value which will configure the checkpoint speed in MB per second. This is also documented in the checkpoint link above. Keep in mind this is an instance level setting and will impact all databases which are not configured to use indirect checkpoint.

For further internals around checkpoint reference: “How It Works: Bob Dorr’s SQL Server I/O Presentation”. For the purposes of this blog we’ll focus on what has changed and what this means for workloads that are upgrading to SQL Server 2016.

Key Changes to Checkpoint Behavior in SQL 2016

The following are the primary changes which will impact behavior of checkpoint in SQL Server 2016.

  1. Indirect checkpoint is the default behavior for new databases created in SQL Server 2016. Databases which were upgraded in place or restored from a previous version of SQL Server will use the previous automatic checkpoint behavior unless explicitly altered to use indirect checkpoint.
  2. When performing a checkpoint SQL Server considers the response time of the I/O’s and adjusts the amount of outstanding I/O in response to response times exceeding a certain threshold. In versions prior to SQL Server 2016 this threshold was 20ms. In SQL Server 2016 the threshold is now 50ms. This means that SQL Server 2016 will wait longer before backing off the amount of outstanding I/O it is issuing.
  3. The SQL Server engine will consolidate modified pages into a single physical transfer if the data pages are contiguous at the physical level. In prior versions, the max size for a transfer was 256KB. Starting with SQL Server 2016 the max size of a physical transfer has been increased to 1MB potentially making the physical transfers more efficient. Keep in mind these are based on continuity of the pages and hence workload dependent.

To determine the current checkpoint behavior of a database query the sys.databases catalog view.

SELECT name, target_recovery_time_in_seconds FROM sys.databases WHERE name = 'TestDB'

A non-zero value for target_recovery_time_in_seconds means that indirect checkpoint is enabled. If the setting has a zero value it indicates that automatic checkpoint is enabled.

RecoveryInterval

This setting is controlled through an ALTER DATABASE command.

Example of Differences in Checkpoint Behavior by Version

Below are some examples of the differences in behavior across versions of SQL Server, and with/without indirect checkpoint enabled. Notice the differences in disk latency (Avg. Disk sec/Write) in each of the examples. Each of the examples below is from an update heavy transactional workload. For each a 30-minute comparable sample has been captured and displayed.

Figure1

Figure 1 – Checkpoint Pattern on SQL Server 2012

 

Figure2

Figure 2 – Checkpoint Pattern on SQL 2014

 

Notice that there is little difference in behavior from SQL Server 2012 to SQL Server 2014.

Figure3

Figure 3 – Checkpoint Pattern on SQL Server 2016 (Using Automatic Checkpoint – Maintains 2012 Behavior on Upgrade)

 

After moving to SQL Server 2016 notice that the latency and amount of I/O being issued (Checkpoint pages/sec) during the checkpoints increases. This is due to the change in how SQL determines when to back off the outstanding I/O being issued.

Figure4

Figure 4 – Checkpoint Pattern on SQL 2016 (After Changing to Indirect Checkpoint)

 

After changing the configuration of the database to utilize indirect checkpoint the SQL engine issues a constant stream of I/O flushes the modified buffers. This is represented as Background writer pages/sec on the graph above. This change has the effect of smoothing the checkpoint spikes and results in providing a more consistent response time on the disk.

Table1

Table 1 – Checkpoint and I/O Performance Metrics for Different SQL Versions and Checkpoint Configurations

In the above observe the following:

  • Automatic checkpoint in SQL Server 2012 can Issue less outstanding I/O than SQL Server 2016. For this particular hardware configuration, the result is higher disk latency on SQL Server 2016 (and more queued I/O’s) than on SQL Server 2012.
  • Indirect checkpoint in SQL Server 2016 has the effect of “smoothing” out the I/O requests for checkpoint operations and significantly reducing disk latency. So while this results in a more constant stream of I/O to the disks the impact of the checkpoint on the disk as well as any other queries running is lessoned.
  • The counters which measure the amount of work being performed by checkpoint are different and depend on the type of checkpoint enabled. The different counters can be used to quickly expose which type of checkpoint and how much work the operations are doing on any given system.
    • Automatic checkpoints are exposed as “Checkpoint Pages/sec
    • Indirect checkpoints are exposed as “Background Writer pages/sec

Summary

There are subtle differences in checkpoint behavior when migrating applications from previous versions of SQL Server to SQL Server 2016 and also differences in configurations options you have available to control these. When migrating applications from to SQL Server 2016 make sure to understand the difference in behavior of databases newly created on SQL Server 2016 vs. those created on previous versions and the configurations options you have available to control these. Indirect checkpoint is the new default and you should consider changing the configuration of existing databases to use indirect checkpoint. Indirect checkpoint can be a very effective approach at minimizing the impact of the more aggressive automatic checkpoint in SQL Server 2016 for systems with I/O configurations that cannot handle the additional load.

Comments

  • Anonymous
    August 03, 2016
    Very interesting, and finally we have a thorough explanation of how the checkpoint changes affect performance. Thank you!
  • Anonymous
    August 04, 2016
    Excellent article Mike.Question: Are there any impacts or known issues elsewhere when using indirect checkpoint ? (for example, functioning of AlwaysON AGs ? Any known issues with indirect checkpoint and Win2012 clusters ?,)Th
  • Anonymous
    August 13, 2016
    Great info !!
  • Anonymous
    August 18, 2016
    Great article!
  • Anonymous
    September 30, 2016
    This was our problem after upgrade, thx!
  • Anonymous
    March 08, 2017
    yes, we are also seeing this problem. Need to try more config options. We also have -k3. Not sure if that makes difference.
  • Anonymous
    June 12, 2018
    I have a question which one of following key changes don't seem applied in sql 2016Key Changes to Checkpoint Behavior in SQL 2016..3. The SQL Server engine will consolidate modified pages into a single physical transfer if the data pages are contiguous at the physical level. In prior versions, the max size for a transfer was 256KB. Starting with SQL Server 2016 the max size of a physical transfer has been increased to 1MB potentially making the physical transfers more efficient. Keep in mind these are based on continuity of the pages and hence workload dependent.But when indirect checkpoint is enabled sql only write a single page, which is critical in Cloud platform. So they consume all iops for that cloud instance.is this symptom caused by design or a bug?