TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads
TCP Chimney is enabled by default if you apply Windows Server 2003 Sp2. This is an operating system feature that provides capability to offload TCP/IP packet processing from the processor to the network adapters and some other balancing options. (For a full description of this feature see https://support.microsoft.com/kb/912222.)
TCP Chimney has been known to cause issues on SQL Server systems such as general network errors and working set trimming. The following articles document these known issues:
https://support.microsoft.com/kb/942861
https://support.microsoft.com/kb/918483
We’ve also identified situations where TCP Chimney has impacted transaction throughput and caused delays between when a statement has been completed by the SQL engine and the time to receive the begin event of the next statement. This impact can be significant especially in application workloads that have throughput requirements to execute a series of statements within a certain time boundary.
For example, your application has a key transaction that consists of multiple statements. Each individual statement on the engine side is optimized and has very short duration. The overall duration of the transaction is short because each statement has low duration and the time in between the batches is short as well. A profiler trace of this transaction typically shows a pattern like the following. Note that there is very short time in between the complete of one batch and the start of the next batch:
However with TCP Chimney enabled, you notice there is a marked delay between a batch completed and the start of the next batch for the exact same series of statements and work. In this example, note how there is approximately a 500 ms. delay in between the complete and start of the next batch:
In this scenario with the 500 ms. delay in between statements you would see the SPID spend most of its time awaiting command in sys.sysprocesses with a waittype of 0x000.
This type of delay can affect application throughput as well as concurrency. For example if the above statements are all encompassed in an implicit transaction, with the added delay the overall duration of the implicit transaction is significantly increased, locks would then be held longer than normal and you may see unexpected blocking. If you do a comparison test of the same implicit transaction between two systems, one with TCP Chimney enabled and the other with TCP Chimney disabled and you compare the sum of the duration of the individual statements vs. the total duration of the entire transaction, you may see that the overall transaction is significantly increased when TCP Chimney is enabled. With TCP Chimney enabled, the delta between the sum of the statement duration from the overall transaction duration shows that the majority of time is spent awaiting the next batch/command.
Here is an example comparison of the same workload with TCP Chimney enabled and disabled. Note the significant increase in transaction duration and the large delta (difference between transaction duration vs. the sum duration of all statements within transaction) when TCP Chimney is enabled:
Implicit Transaction Summary TCP Chimney Enabled
spid TransactionID TranStart TranEnd TranDuration sum_batch_duration batch_count delta
------- -------------- ------------- ------------ ------------- -------------------- -------------- --------
57 916972 09:40:24.450 09:41:17.623 53173 601 516 52572
57 896243 09:39:31.620 09:40:01.840 30220 322 301 29898
57 877227 09:39:12.120 09:39:15.293 3173 306 161 2867
57 876313 09:38:58.590 09:38:58.603 13 0 1 13
57 895388 09:39:18.510 09:39:18.527 16 16 4 0
57 915675 09:40:02.653 09:40:02.670 16 16 4 0
Implicit Transaction Summary TCP Chimney Disabled
spid TransactionID TranStart TranEnd TranDuration sum_batch_duration batch_count delta
------- -------------- ------------ ------------ ------------- -------------------- -------------- --------
54 127910 11:13:47.287 11:13:52.490 5203 4060 516 1143
54 107344 11:13:23.380 11:13:24.427 1046 382 301 664
51 87187 11:12:50.067 11:12:50.550 483 0 1 483
54 88182 11:13:03.987 11:13:07.237 3250 2878 161 372
51 106432 11:13:10.487 11:13:10.487 0 0 1 0
54 126550 11:13:25.490 11:13:26.007 516 516 4 0
If you observe a similar pattern and suspect TCP Chimney, you may want to disable TCP Chimney to provide immediate relief. Another option is to follow up with your network adapter vendor to see if they have an updated driver that will address the problem and allow for use of TCP Chimney. For additional information see https://support.microsoft.com/default.aspx?scid=kb;EN-US;948496.
TCP Chimney is off by default in Windows Server 2008 - see https://support.microsoft.com/kb/951037.
Sarah Henwood | Microsoft SQL Server Escalation Services
Comments
Anonymous
October 03, 2008
I appreciate the information on the impact of SNP on SQL Server, and the desire to disable SNP to avoid the impact as a short-term work-around. I would assume that a feature which offloads network processing workload from the system processor(s) to the network adapter(s) would result in greater capacity and better performance of the main system workload. Evidently, the specific implementation has flaws that block this goal. Are there resolutions in process to correct these defects and allow the desired gains – and in what time frame? None were mentioned in the article, but the question begs to be asked. Since the feature leverages capabilities within the OS and within specific network adapter(s), it is possible that the defect is in the network adapter (a hardware vendor responsibility), the OS (a Microsoft responsibility), or both. I am interested in hearing more information on the root cause(s) and possible resolution(s) for this issue. Thanks, Scott R.Anonymous
October 04, 2008
ScalableNetworkingPack(常见问题详细描述)在Server2003安装SP2后会默认被打开,打开此功能可以把对网络数据包的处理从CPU分担一部分到网卡上,但这可能会引起S...Anonymous
October 25, 2008
By now most of us are using Windows 2003 SP2 unless there are strict application requirements and yourAnonymous
March 30, 2011
This is an excellent article that addresses an issue we have been trying to troubleshoot for some time. I would like to learn more about the actual steps used to obtain the data and the SQL Queries used to extract the data. My guess is the fn_trace_gettable was used to pull the information from the Trace file but I would greatly appreciate getting the actual steps and Queries used to provide these results as they would be instrumental in proving how Offload reduces SQL performance. Best Regards, SQLRookie