Let’s talk about trace flags

One of the most confusing aspects of SQL Server configuration is often trace flags. There are lots of trace flags out there, and while many of them are documented, when to use them and when not to is not always clear. The primary reason for this post is to introduce a new article from Microsoft Support that provides guidance on trace flags and other configurations which may help high-end SQL Servers perform better:

Recommended updates and configuration options for SQL Server 2017 and 2016 with high-performance workloads https://support.microsoft.com/kb/4465518

The article was published this past fall and is an update of an older article with which you may already be familiar:

Recommended updates and configuration options for SQL Server 2012 and SQL Server 2014 with high-performance workloads https://support.microsoft.com/kb/2964518

Microsoft Support has provided these articles to help guide customers toward configuration options that both improve performance, but also help to avoid known issues that present themselves when the server is running on big iron (large memory and many CPUs) and/or is under heavy load.

You may be asking yourself, if these trace flags provide a benefit, why aren't they just the default behavior? In a lot of cases, we're asking ourselves the same thing 😊 If you've read both the articles above, you may have noticed that the list of recommended trace flags for SQL Server 2016 and 2017 is much smaller than the list for 2012 and 2014. This is because starting with SQL Server 2016, many of the trace flags that were introduced to solve problems in earlier versions just became the default behavior.

So why do we even have trace flags to begin with?

Some trace flags are used to enable enhanced debugging features such as additional logging, memory dumps etc. and are used only when you are working with Microsoft Support to provide additional data for troubleshooting. These trace flags are not ones you want to leave turned on in a production system as they may have a negative impact on your workload. An example of one of these flags would be TF 2551 which is used to trigger a filtered memory dump whenever there is an exception or assertion in the SQL Server process. These trace flags are only used for a short period of time and typically only at the recommendation of Microsoft Support, so they will likely always be around.

Other trace flags are used to alter the behavior of the server in other ways, such as to turn a feature ON or OFF or change the way the database engine manages resources. One example of this type of trace flag is 7752 which was introduced as a knob for something that is default behavior in Azure SQL DB. In a SQL Server (on-prem or IaaS) database that is undergoing recovery and has Query Store (QDS) enabled, user queries will be blocked until all the data required for QDS to start is loaded. This ensures QDS doesn't miss any queries that are executed in that database. In some cases, this can take a long time to complete and you'll see sessions with a wait type of QDS_LOADDB until the QDS becomes available. Turning on TF 7752 makes this process asynchronous so that user queries can proceed while the QDS starts. It's not something we want to make the default behavior in general because it means that some query executions won't be captured by QDS, but it's a tradeoff you might be willing to make in order to reduce the time it takes for a database to become available upon restart or failover. This is the sort of trace flag that we are trying to incorporate into the product in some other way, such as to provide a database-scoped configuration. Moving forward, we hope not to introduce any new trace flags like this, and over time the number of flags of this type should approach zero.

A subset of the above category is what many folks call the "benchmark specials" – trace flags that alter or disable database engine behavior that is normally desirable, but perhaps not absolutely required if you want lightening fast performance. This would be things like trace flag 661 that disables ghost record cleanup, or one of several flags that disable ring buffers used for monitoring and diagnostic purposes. For the most part, these flags are not ones that you want to run in production because they turn off features that are required for the maintenance and manageability of your server. If you really need lightening fast performance though, and you're willing to make the trade-off of manageability for speed, these are available and documented here. Most of these flags don't make sense to add to the product, but there are a few that have become useful in wider scenarios and are recommended for certain workloads. One flag like this is trace flag 834. This flag turns on large-page memory allocations for the buffer pool, which can be helpful for data warehouse-type workloads. There are some caveats to turning this flag on however, so be sure to review the documentation before considering it. This is one that is also documented in the two KB articles referenced at the top of this blog.

The last reason for trace flags is to introduce new behavior or change existing behavior in an in-market release of SQL Server (i.e. a Cumulative Update). We know our customers find that installing updates can be disruptive, and there are many risk-averse customers who are reluctant to apply any non-security updates at all. For the past several versions of SQL Server we have worked hard to make updates as seamless as possible, and this includes keeping any potential behavior changes to a minimum. As Pedro likes to say, "backward compatibility is the unsung hero feature of SQL Server" 😊 To that end, any change we make to an in-market release of SQL Server that has the potential to impact performance or change the behavior of your application, be it an improvement or a bug fix, will be gated by a trace flag (or compatibility level – that's another story). The most common trace flag like this is 4199. This is a very important flag, and one that's going to be around for a while so it's worth talking about in some detail.

Fixes to the query optimizer fall under the category of behavior changes that have the potential to impact performance. In many cases, the impact should be positive, but sometimes there are edge cases where they can have a negative impact, so it's not something we want to throw into a cumulative update by default. For this reason, all the fixes that were made to the optimizer between SQL Server 7.0 and SQL Server 2014 were gated by a trace flag. Up until SQL Server 2005 SP3, each one would have to be enabled individually, but in this release, they were rolled up under TF 4199. If you're running one of these versions of SQL Server and you don't have this trace flag turned on, you're missing out on 20 some-odd years of query optimizer fixes that may benefit your workload.

Starting with SQL Server 2016, all the fixes from SQL Server 7.0 up to SQL Server 2016 RTM were rolled into the engine by default (when running with 130 compatibility level), but the changes that were released AFTER SQL Server 2016 RTM are still gated by TF 4199. This was also the case with SQL Server 2017 RTM and will continue to be the practice moving forward, all the optimizer fixes will be rolled into the engine at RTM and this becomes the baseline optimizer behavior for that compatibility level (even if you're running on a later version of the database engine). When you have 4199 enabled, all the optimizer fixes available in the current build that are applicable to the current database compatibility level will be enabled. Incidentally, this behavior can also be controlled at the database level using the QUERY_OPTIMIZER_HOTFIXES database scoped configuration option and at the query level using the USE HINT 'ENABLE_QUERY_OPTIMIZER_HOTFIXES' query hint, although the trace flag is needed if you want to set this at the server level.

As you modernize your databases, you should find that the configuration experience for SQL Server gets simpler and more intuitive. While there will likely always be some trace flags, we expect their use to be limited in scope, and any trace flags that are recommended by Microsoft for production use will be thoroughly documented. In the meantime, be sure to reference the articles above any time you build and configure a new SQL Server. We will try to keep these updated whenever new support trends emerge that indicate a trace flag or other configuration setting should be widely recommended for Tier-1 workloads.

Comments

  • Anonymous
    January 15, 2019
    Thank you for taking the time to compile this great message.
  • Anonymous
    January 16, 2019
    Its the lack of MS documentation of Trace Flags thats been the issiue
    • Anonymous
      January 17, 2019
      Ack on that, which is why we're doing our best to document everything we can. You will find most trace flags here: https://aka.ms/traceflags, and this should include any trace flag that is widely recommended.