Debugging slow response times in SQL Server 2008
Introduction
The other day I came across a nice article written by Mario Broodbakker called "SQL Server Wait Events: Taking the Guesswork out of Performance Profiling" (see references) on how to troubleshoot slow response times in SQL Server 2000 and 2005 using the YAPP Method. According to the author, one of the biggest challenges to apply this method in SQL Server is that fact that the wait statistics are collected and exposed at the server level (server wide):
... "This is simply not enough, unless you have the luxury of testing your application and SQL statements on an isolated server" ..." But, of course, it's often hard or impossible to isolate and replicate production problems on an isolated test server." ...
I decided to write this because in SQL Server 2008 the wait statistics are available at virtually any level, and (at the time of writing) I couldn't find any articles on the new sqlos.wait_info and sqlos.wait_info_external extended events.
YAPP Method
The YAPP Method: "Yet Another Performance Profiling Method" is based on a paper written by Anjo Kolk (see references). The basic idea of the method can be expressed with the following formula:
(1) Response time = service time + wait time
This method for performance troubleshooting has been around for some time and it is a fairly standard practice when debugging slow response times.
You can get the response time and the service time of a query by looking at sys.dm_exec_sessions, columns total_elapsed_time and cpu_time. Alternatively, you can use SQL trace or the more lightweight extended events in SQL Server 2008. Knowing response time and service time and applying formula (1) , gives you wait time. So, we know how much we waited. However, in order to get the individual wait types we need the new extended events.
Per session wait statistics in SQL Server 2008
In SQL Server 2008 you can get wait statistics for a given session or statement by leveraging the new extended events. Optionally, you can limit the collection to a small set of wait types or collect only waits exceeding certain duration. If your event session is written carefully, you can use this technique on production systems with very little or no performance impact. In fact, SQL Server 2008 ships with a built-in event session, which is collecting wait statistics, and is running by default out of the box.
Here is how to create an event session to collect wait types for session 51. The events will go to the specified trace file:
create event session session_waits on server
add event sqlos.wait_info
(WHERE sqlserver.session_id=51 and duration>0)
, add event sqlos.wait_info_external
(WHERE sqlserver.session_id=51 and duration>0)
add target package0.asynchronous_file_target
(SET filename=N'c:\wait_stats.xel', metadatafile=N'c:\wait_stats.xem');
Here is how to start the event session, run your workload, and then stop the event session:
alter event session session_waits on server state = start;
exec sp_workload;
alter event session session_waits on server state = stop;
Here is how to read the trace file:
select * from from sys.fn_xe_file_target_read_file
('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)
The result is a list of wait events, representing individual waits, having wait type and duration column. In order to make better sense out of this we need to group the events by wait type and calculate the total duration per wait type. The following code can do this:
create view dbo.read_xe_file as
select object_name as event, CONVERT(xml, event_data) as data
from sys.fn_xe_file_target_read_file
('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)
go
create view dbo.xe_file_table as
select
event
, data.value('(/event/data/text)[1]','nvarchar(50)') as 'wait_type'
, data.value('(/event/data/value)[3]','int') as 'duration'
, data.value('(/event/data/value)[6]','int') as 'signal_duration'
from dbo.read_xe_file
go
select
wait_type
, sum(duration) as 'total_duration'
, sum(signal_duration) as 'total_signal_duration'
from dbo.xe_file_table
group by wait_type
order by sum(duration) desc
go
Example
Hopefully this example will demonstrate the power of sqlos.wait_info and sqlos.wait_info_external events. I did some measurements by executing the following workload from two connections:
create proc #sp_workload
as
begin
SET nocount ON
CREATE TABLE #t1000k (c2 char(1020));
BEGIN TRANSACTION
DECLARE @rows INT = 50000;
DECLARE @row INT = 0;
DECLARE @count INT = 0;
WHILE @row < @rows
BEGIN
INSERT INTO #t1000k (c2) VALUES (REPLICATE('a',1000));
SELECT @row = @row + 1
IF @count > 10
BEGIN
COMMIT
BEGIN TRANSACTION
SELECT @count=0
END
SELECT @count=@count+1
END
COMMIT
END
go
While running the workload I had IOMeter (see references) so that I can add some disk pressure.
Results for session 51: Apparently, most of the time was spent waiting for space in the log buffer to store log records - wait type LOGBUFFER. For complete list of all wait types with their descriptions search for sys.dm_os_wait_stats in MSDN or Books Online (see references).
Response time | Service time | Wait time |
32619 | 1828 | 30791 |
wait_type | total_duration | total_signal_duration |
LOGBUFFER | 9662 | 66 |
PREEMPTIVE_OS_FILEOPS | 6442 | 276 |
PREEMPTIVE_OS_FLUSHFILEBUFFERS | 3833 | 150 |
PREEMPTIVE_OS_WRITEFILEGATHER | 3015 | 66 |
SOS_SCHEDULER_YIELD | 2481 | 2480 |
WRITE_COMPLETION | 1971 | 0 |
IO_COMPLETION | 1183 | 0 |
PAGEIOLATCH_UP | 551 | 0 |
WRITELOG | 417 | 0 |
PAGELATCH_UP | 30 | 27 |
PAGELATCH_SH | 10 | 10 |
PREEMPTIVE_XE_CALLBACKEXECUTE | 4 | 200 |
Results for session 52: The result for this session is very similar. We can tell that most of the time was spent waiting. After accumulating the durations from all waits we get 30726, which is pretty close to the total wait time calculated using sys.dm_exec_sessions (31484).
Response time | Service time | Wait time |
33890 | 2406 | 31484 |
wait_type | total_duration | total_signal_duration |
LATCH_EX | 16555 | 32 |
LOGBUFFER | 10796 | 28 |
SOS_SCHEDULER_YIELD | 2739 | 2729 |
WRITELOG | 605 | 0 |
PAGELATCH_UP | 26 | 25 |
PAGELATCH_SH | 5 | 5 |
For the sake of completeness I also collected server-wide wait statistics, which are also available in previous version of SQL Server. The server-wide statistics clearly shows that we have IO bottleneck however you don't know exactly which session or query was affected and by how much:
wait_type | wait_time_ms | signal_wait_time_ms |
LOGBUFFER | 21666 | 127 |
WRITELOG | 17118 | 4 |
LATCH_EX | 16561 | 34 |
PAGEIOLATCH_UP | 7891 | 0 |
SOS_SCHEDULER_YIELD | 7058 | 7038 |
PREEMPTIVE_OS_FILEOPS | 6443 | 0 |
WRITE_COMPLETION | 5896 | 4 |
SLEEP_BPOOL_FLUSH | 3916 | 278 |
PREEMPTIVE_OS_FLUSHFILEBUFFERS | 3833 | 0 |
PREEMPTIVE_OS_WRITEFILEGATHER | 3016 | 0 |
IO_COMPLETION | 1680 | 0 |
PREEMPTIVE_OS_REPORTEVENT | 141 | 0 |
PAGELATCH_UP | 139 | 128 |
PREEMPTIVE_OS_PIPEOPS | 128 | 0 |
PREEMPTIVE_XE_TARGETINIT | 85 | 0 |
XE_BUFFERMGR_ALLPROCESSED_EVENT | 70 | 0 |
PAGELATCH_SH | 23 | 23 |
PREEMPTIVE_OS_QUERYREGISTRY | 16 | 0 |
PREEMPTIVE_XE_CALLBACKEXECUTE | 8 | 0 |
PAGELATCH_EX | 7 | 7 |
PREEMPTIVE_XE_SESSIONCOMMIT | 5 | 0 |
LATCH_SH | 1 | 0 |
PREEMPTIVE_OS_GETDISKFREESPACE | 1 | 0 |
Complete test code
Here is the complete source code of the test:
use tempdb
go
--
-- Enable xp_cmdshell to delete trace files
--
exec sp_configure 'show advanced options', 1
reconfigure
exec sp_configure 'xp_cmdshell', 1
reconfigure
go
BEGIN TRY
drop proc #sp_workload
END TRY BEGIN CATCH END CATCH
go
create proc #sp_workload
as
begin
SET nocount ON
CREATE TABLE #t1000k (c2 char(1020));
BEGIN TRANSACTION
DECLARE @rows INT = 50000;
DECLARE @row INT = 0;
DECLARE @count INT = 0;
WHILE @row < @rows
BEGIN
INSERT INTO #t1000k (c2) VALUES (REPLICATE('a',1000));
SELECT @row = @row + 1
IF @count > 10
BEGIN
COMMIT
BEGIN TRANSACTION
SELECT @count=0
END
SELECT @count=@count+1
END
COMMIT
END
go
BEGIN TRY
drop event session session_waits on server
END TRY BEGIN CATCH END CATCH
go
exec xp_cmdshell 'del c:\wait_stats*.xel', no_output;
exec xp_cmdshell 'del c:\wait_stats*.xem', no_output;
go
declare @cmd nvarchar(max);
set @cmd = N'create event session session_waits on server ' +
N'add event sqlos.wait_info (WHERE sqlserver.session_id=' + convert(nvarchar(10), @@spid) + N' and duration>0) ' +
N', add event sqlos.wait_info_external (WHERE sqlserver.session_id=' + convert(nvarchar(10), @@spid) + N' and duration>0) ' +
N'add target package0.asynchronous_file_target (SET filename=N''c:\wait_stats.xel'', metadatafile=N''c:\wait_stats.xem'')';
exec(@cmd);
go
BEGIN TRY
drop table #snapshot
END TRY BEGIN CATCH END CATCH
go
select total_elapsed_time, cpu_time into #snapshot from sys.dm_exec_sessions where session_id=@@SPID;
go
alter event session session_waits on server state = start;
exec #sp_workload;
alter event session session_waits on server state = stop;
go
select
(es.total_elapsed_time - ss.total_elapsed_time) as 'Response time',
(es.cpu_time - ss.cpu_time) as 'Service time',
((es.total_elapsed_time - ss.total_elapsed_time) - (es.cpu_time - ss.cpu_time)) as 'Wait time'
from sys.dm_exec_sessions es, #snapshot ss where session_id=@@SPID;
go
drop event session session_waits on server
go
BEGIN TRY
drop view dbo.read_xe_file
END TRY BEGIN CATCH END CATCH
go
create view dbo.read_xe_file as
select object_name as event, CONVERT(xml, event_data) as data
from sys.fn_xe_file_target_read_file('c:\wait_stats*.xel', 'c:\wait_stats*.xem', null, null)
go
BEGIN TRY
drop view dbo.xe_file_table
END TRY BEGIN CATCH END CATCH
go
create view dbo.xe_file_table as
select
event
, data.value('(/event/data/text)[1]','nvarchar(50)') as 'wait_type'
, data.value('(/event/data/value)[3]','int') as 'duration'
, data.value('(/event/data/value)[6]','int') as 'signal_duration'
from dbo.read_xe_file
go
select
wait_type
, sum(duration) as 'total_duration'
, sum(signal_duration) as 'total_signal_duration'
from dbo.xe_file_table
group by wait_type
order by sum(duration) desc
go
--
-- CLEANUP
--
exec xp_cmdshell 'del c:\wait_stats*.xel', no_output;
exec xp_cmdshell 'del c:\wait_stats*.xem', no_output;
drop view dbo.xe_file_table;
drop view dbo.read_xe_file;
drop proc #sp_workload;
References
Sys.dm_os_wait_stats
The following MSDN article shows all wait types with their description: https://msdn.microsoft.com/en-us/library/ms179984.aspx
IOMeter
IOMeter is an IO stress and measurement tool developed by Intel. It is now an Open Source tool, and freely downloadable from:
SQL Server Wait Events: Taking the Guesswork out of Performance Profiling
A very nice article by Mario Broodbakker on how to troubleshoot slow response times in SQL Server 2000 and SQL Server 2005:
Troubleshooting Performance Problems in SQL Server 2005
Here also has a very good presentation on the subject and describes a lot of SQL Server 2005 wait events here:
https://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx
Comments
Anonymous
January 01, 2003
PingBack from http://www.pigmig.com/debugging-slow-response-times-in-sql-server-2008Anonymous
January 01, 2003
Check out this cool article from the SQLOS team on the technet blogs. Read it here .Anonymous
May 16, 2014
The comment has been removedAnonymous
May 16, 2014
The comment has been removedAnonymous
May 16, 2014
The comment has been removedAnonymous
May 16, 2014
The comment has been removedAnonymous
May 17, 2014
Pingback from What Is The Best Practice For Storing Image Varbinary(max) Data In A Database Table? | Click & Find Answer !