SQL Server 2016/2017: Availability group secondary replica redo model and performance

When availability group was initially released with SQL Server 2012, the transaction log redo was handled by a single redo thread for each database in an AG secondary replica. This redo model is also called as serial redo. In SQL Server 2016, the redo model was enhanced with multiple parallel redo worker threads per database to share the redo workload. In addition, each database has a new helper worker thread for handling the dirty page disk flush IO. This new redo model is called parallel redo. With the new parallel redo model that is the default setting since SQL Server 2016, workloads with highly concurrent small transactions are expected to achieve better redo performance. When the transaction redo operation is CPU intensive, such as when data encryption and/or data compression are enabled, parallel redo has even higher redo throughput (Redone Bytes/sec) compared to serial redo. Moreover, indirect checkpoint allows parallel redo to offload more disk IO (and IO waits for slow disk) to its helper worker thread and frees main redo thread to enumerate more received log records in secondary replica. It further speeds up the redo performance. However parallel redo, which enables multi-threading model, has an associated cost.

  1. Although the main redo thread stops executing individual transaction log redo operation, it is responsible for enumerating and dispatching each transaction log to its parallel redo worker threads. The cost for dispatching these logs can be substantially high in scenarios where the log redo operation is not CPU intensive, such as redo of a DML transaction on a narrow table (with small row sizes) row.
  2. System transactions for page splits caused by new record inserts can introduce the PARALLEL_REDO_TRAN_TURN waits across parallel redo worker threads when a secondary replica is configured as readable replica. Depending on the frequency of insert operations, this can significantly slow down parallel redo performance.
  3. When read-only queries are running on a readable secondary replica, query threads attempt to apply pending log redo operations and need to collaborate with redo worker threads with DIRTY_PAGE_TABLE_LOCK waits, which can be frequently generated and slow down both redo and query performance if there are concurrent redo workloads. The performance issue associated with DIRTY_PAGE_TABLE_LOCK wait is addressed in the cumulative update release for SQL Server 2016 SP and SQL Server 2017 mentioned in this article.

Based on a current performance study, below types of transaction workload or SQL configuration will generally perform better with the default parallel redo model

  1. High concurrent small size transactions.
  2. Expensive log redo operation (such as data encryption or data compression) with indirect checkpoint
  3. Non-readable secondary replica, or only occasional read-only queries to a readable secondary replica when there is busy transaction log redo traffic

For the following scenarios, switching to serial redo is expected to achieve better redo throughput:

  • Long running transaction(s) with predominant inserts and limited concurrency – a typical example is online index rebuild of a clustered index of a large table
    • Symptoms in parallel redo model:
      • The frequency of PARALLEL_REDO_TRAN_TURN wait generally is proportional to the volume of insert operations. More inserts can trigger more page splits, which maps to more PARALLEL_REDO_TRAN_TURN waits.
      • Workload concurrency can be monitored by perf counter (Object – SQLServer:General Statistics, Counter – User Connections) or DMV "sys.dm_exec_connection" & "sys.dm_exec_sessions" in primary replica
  • Frequent and/or long-running read-only queries need to be executed on a database of a secondary replica which has concurrent transaction log redo workloads. Query and redo do not have to run against the same set of tables in the database.
    • Symptoms in parallel redo model:
      • Frequent DIRTY_PAGE_TABLE_LOCK waits
      • Monitor perf counter (Object – SQLServer:Database Replica, Instance – [DBName], Counter-Redone Bytes/sec) to compare the redo throughput when query is running and not
  • A small set of data records whose data pages are scanned by high concurrent read-only queries on a secondary replica while frequent changes are made to the same set of data records on primary replica. Query and redo threads may dramatically slow down each other in this situation.
    • Symptoms in parallel redo model:
      • High volume of DPT_ENTRY_LOCK waits
      • For the worst case, a number of error message "Timeout occurred while waiting for latch" are logged in SQL Error Log followed by a "Latch timeout" dump

To switch to serial redo model, TF 3459 needs to be enabled. But after the SQL Server instance is running in serial redo model, the only way to change it back to parallel redo is to restart the SQL Server service. Multiple factors impact redo performance, some of them are only applicable to the new redo model - parallel redo, such as PARALLEL_REDO_TRAN_TURN and DIRTY_PAGE_TABLE_LOCK waits. For different transaction workloads and hosted machine configurations, it is not always certain which subset of factors has stronger impact to redo performance. If the redo performance of your workload does not align with the explanation in this document, please share your workload and host machine details to Microsoft Customer Service and Support (CSS) . It will help us build a more complete view and evaluate future improvements with redo performance.

New wait types

A few new thread wait types were added with the new Parallel Redo model in SQL Server 2016. This information can be queried from sys.dm_os_wait_stats. Some of these wait types indicate performance impact, while the others can be considered benign.

Wait type with performance impact:

Type

Description

Comment

PARALLEL_REDO_FLOW_CONTROL

Occurs when the main redo thread cannot dispatch more transaction log records when log cache array for dispatched transaction log is full.

Indicates that one or more parallel redo worker threads cannot keep up with main redo thread transaction log dispatching speed or are blocked by some resources such as other type of waits. When this wait occurs frequently, parallel redo worker threads does not function efficiently.

PARALLEL_REDO_TRAN_TURN

Occurs when a parallel redo worker thread needs to wait for a dependent transaction log be redone before starting to redo the current transaction log. The dependent transaction log may be assigned to a different parallel redo worker, which means a cross-thread wait.

Only happens in a readable secondary replica when new insert triggers page-split system transaction, or record update in a heap table generates a forwarded record (More Information on forwarded records).

DIRTY_PAGE_TABLE_LOCK

Occurs when there is a wait on a lock that control access to dirty page table. Dirty page table contains information of modified data pages that need to be flushed to disk. With parallel redo model, dispatched transaction logs that are pending for redo are associated with dirty page entries.

When there are read-only queries in a readable secondary replica. Queries try to process pending transaction logs during data page scan. It means query threads need to collaborate with main redo thread and parallel redo work threads for this dirty page table lock access and can slow down both query and redo performance when they are running concurrently. This wait will not be generated anymore after the performance fix for concurrent read-only query and log redo is released.

DPT_ENTRY_LOCK

Occurs when there is a wait on a lock that control access to a dirty page entry which has redo to catch up from user query thread.

Only occurs when parallel redo worker thread and a user query thread concurrently process redo operations for the same dirty page entry.

Wait types without performance impact:

Type

Description

Comment

PARALLEL_REDO_WORKER_WAIT_WORK

Occurs when any parallel redo worker threads or redo helper thread has nothing to do.

It is an idle wait and expect to occur regularly. It indicates either main redo worker thread cannot dispatch transaction logs as fast as the total redo speed of all parallel redo workers, or more commonly, the corresponding database in AG primary replica is idle and does not generate many transaction logs for secondary replica to redo.

PARALLEL_REDO_DRAIN_WORKER

Occurs when main redo thread needs to drain out all outstanding redo operations, such as when redoing file size change or checkpoint, etc.

Expect to occur regularly

PARALLEL_REDO_LOG_CACHE

Occurs when there is a wait on a lock that controls update to log cache array from parallel redo worker threads.

Infrequent waits. Occurs when parallel redo worker threads help signal redo steps after a flow control happens, most time main redo thread does it without lock wait.

PARALLEL_REDO_TRAN_LIST

Occurs when there is a wait on a lock that controls access to a special cache. The cache contains information about transactions that are fully redone and ready for release.

Infrequent waits. Occurs when main redo worker needs to access this transaction list when draining outstanding redo operations. Most time the helper worker access the transaction list without lock wait.

PARALLEL_REDO_WORKER_SYNC

Occurs when waiting for all parallel redo worker threads to stop.

When parallel redo threads are not successfully created or are stopped (such as enabling TF 3459)

Parallel redo thread usage and redo model control

Parallel redo thread usage is well covered in "Thread usage by Availability Groups" here.

A SQL Server instance uses up to 100 threads for parallel redo for secondary replicas. Each database uses up to one-half of the total number of CPU cores, but not more than 16 threads per database. If the total number of required threads for a single instance exceeds 100, SQL Server uses a single redo thread for every remaining database.

When the host server has 32 or more CPU cores, each database will occupy 16 parallel redo worker threads and one helper worker thread. It means that all databases starting with the 7th database (ordered by database id ascending) that has joined availability group it will be in single thread redo or serial redo irrespective which database has actual redo workload. If a SQL Server Instance has a number of databases, and it is desired for a particular database to run under parallel redo model, the database creation order needs to be considered. Same idea can be applied to force a database always runs in serial redo model as well. Again, in SQL Server instance level, the way to switch between parallel redo and serial redo is the TF 3459. All databases in the same SQL Server instance will be switched together. Also, to switch from serial redo to parallel redo by disable TF 3459, a SQL Server service restart is required (not required anymore since the cumulative update release for SQL Server 2016 SP and SQL Server 2017 mentioned in this article).

Dong Cao
Principal Software Engineer
Microsoft Azure Data Group