How do you measure CPU pressure?
It is important to understand whether CPU pressure is affecting SQL Server performance. This is true even in the case where SQL Server is the only application running on a particular box. The System object Perfmon counter Processor Queue length is not necessarily an effective way of measuring CPU pressure in SQL Server. To see why this is the case, we first must take a brief (and simplified) look at the SQL Server Execution Model.
SQL Server uses a User Mode Scheduler (UMS) to control the execution of SQL Server user requests (SPIDs or session_ids). The UMS does not replace the Windows scheduler but rather, manages the execution of SQL Server requests (without returning control to Windows). So when SQL Server gets its time slice from the Windows scheduler, the SQL Server UMS manages what user requests are run during this time. In a 4-proc scenario, there will be 4 User Mode Schedulers, one for each CPU. Each UMS uses a number of constructs (queues, lists and worker threads) to govern execution. At any given time, each UMS will have at most a single running user, a runnable queue of requests that are waiting for CPU, a waiter list (for resources such as IO, locks, memory), and a work queue (user requests that are waiting for worker threads).
The runnable queue can be likened to a grocery analogy where there are multiple check out lines. The register clerk is the CPU. There is just one customer checking out e.g. “running” at any given register. The time spent in the checkout line represents CPU pressure. The longer the line, the longer the waits, hence more CPU pressure.
OK, back to SQL Server. Assume the following for a single UMS: SPID 51 is currently running. The Runnable Queue consists of SPIDs 60, 55, 87 & 79. The Waiter list includes SPIDS 55, 84 & 72. The Work queue is empty. Now, assume the running SPID 51 needs physical IO. Two things will happen. SPID 51 is moved to the wait list until the IO is completed and the next session_id in the runnable queue, SPID 60, begins to run. When the IO is complete, SPID 51 is moved to the bottom of the runnable queue which now consists of SPIDS 55, 87, 79 & 51.
Given this scenario, the total amount of time waiting consists of resource and signal waits. The time waiting for a resource is shown as Resource Waits. The time waiting in the runnable queue for CPU is called Signal Waits. In SQL Server 2005, waits are shown in the Dynamic Management View (DMV) sys.dm_os_wait_stats. The query to measure cpu pressure is as follows:
---- Total waits are wait_time_ms
Select signal_wait_time_ms=sum(signal_wait_time_ms)
,'%signal (cpu) waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
,resource_wait_time_ms=sum(wait_time_ms - signal_wait_time_ms)
,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
From sys.dm_os_wait_stats
You can initialize or clear out SQL Server 2005 waitstats with the statement dbcc sqlperf ([sys.dm_os_wait_stats],clear) with no_infomsgs. In SQL Server 2000, waits can be seen by looking at DBCC SQLPERF (WAITSTATS).
If you have SQL–dedicated box it is conceivable that you could have high signal waits indicating CPU pressure and still have a low processor queue length. In such cases, a faster CPU could reduce signal waits but just focusing on the low processor queue length as a measure of CPU contention may lead to the erroneous conclusion that CPU is fine.
In conclusion, if Signal Waits are a significant percentage of total waits, you have CPU pressure which may be alleviated by faster or more CPUs. Alternately, CPU pressure can be reduced by eliminating unnecessary sorts (indexes can avoid sorts in order & group by’s) and joins, and compilations (and re-compilations). If Signal Waits are not significant, a faster CPU will not appreciably improve performance.
Tom Davidson
Comments
Anonymous
September 07, 2005
The comment has been removedAnonymous
September 07, 2005
Hello Darryl,
It is difficult to say what CPU utilization is the right number. For example if you are running an active/active cluster then you definitely want to run less than 45% CPU so when you fail over you have no problem handling the workload. A better thing to do is to use CPU pressure as a guide as Tom Davidson states above. Also you should look at what I call good CPU vs bad CPU. CPU is either user time(good) or privledge time(also known to some as kernal time). User time is the CPU is doing actual work like sorting, etc. Privledge time is CPU wait time. You could be waiting on IO completion, context switching or some other activity outside of user time.
So, look at user time and look at CPU pressure as a better guide. Also , don't run at 100% :)Anonymous
September 08, 2005
Mark,
Thanks for the update and the additional information. I will definitely use this for additional analysis. However, my original comment was misleading, my bad. What I was actually talking about was CPU Pressure, not CPU utilization. Tom writes in his post “if Signal Waits are a significant percentage of total waits” then you may be experiencing CPU Pressure. This is great information, but I am not sure what “significant” means with respect to Signal Waits and Total waits. For example, monitoring my SQL Server with DBCC SQLPERF (WAITSTATS), I see that 31% of my total waits are Signal Waits. How do I tell if that is significant for my system?
Thanks again guys!
DarrylAnonymous
September 12, 2005
The comment has been removedAnonymous
September 15, 2005
The comment has been removedAnonymous
November 18, 2005
For SQL 2000, would the above script be:
[code]
create table #tempwaits
(type varchar(40),
requests int,
waittime numeric(19,3),
signalwaittime numeric(19,3))
dbcc sqlperf (waitstats, clear)
go
waitfor delay '000:01:00'
insert into #tempwaits
exec ('dbcc sqlperf (waitstats)')
select sum(signalwaittime) as "Signal Wait Time",
100.0 * (sum(signalwaittime)/sum(waittime)) as "%cpu waits",
sum(waittime - signalwaittime) as "Resource Wait Time",
100.0 * (sum(waittime - signalwaittime)/sum(waittime)) as "% Resource Waits"
from #tempwaits
[/code]Anonymous
November 25, 2005
On a related note, I've heard much conflicting information about Intel CPUs with their Hyperthreading architecture versus SQL Server. Whether to leave it on or disable it, etc. Can you address this in a future posting?Anonymous
September 12, 2006
If you are having trouble performance tuning SQL Server 2000, there is one whitepaper you need. ...Anonymous
December 30, 2006
SQL Server waits enable you to identify where SQL Server is spending a lot of time doing nothing butAnonymous
February 06, 2007
And what about the deviation caused by wait types such as LAZYWRITER_SLEEP. Clearly, this wait should not been included in the calculation (as you can read in BOL describing dm_os_wait_stats view). My question is, how many other wait types should be excluded from this calculation? Perhaps also waits related to backup devices?Anonymous
February 09, 2008
PingBack from http://sqlserverpedia.com/blog/?p=229Anonymous
May 14, 2008
You might encounter a situation, under a very heavy workload, where processing for certain transactionsAnonymous
January 21, 2009
PingBack from http://www.keyongtech.com/2163609-context-switching-through-the-roofAnonymous
May 26, 2009
PingBack from http://backyardshed.info/story.php?title=microsoft-sql-server-development-customer-advisory-team-how-do-you