Compartilhar via


SQL High CPU and TempDB Growth By Scalar UDF

Most high CPU troubleshooting technologies focus on execution plan compilation and recompilation OR suboptimal plans such as hash or merge join, among others. We (MSDN SQL team) recently found out a unique scenario of using scalar UDF (user defined function) and tempDB object (temp table or temp variable) that can also lead to high CPU usage as well as tempDB growth.

In this post, I will show you an easy to understand, reproducible demo that you can try it out yourself. I will also use several performance counters to illustrate different anomalies by changing a section of the code.

Preparation:

First, let's create a simple table and a one liner scalar UDF:

create table tblDateDT (DateID int identity(1,1) primary key, dtDate datetime)

go

declare @i int = 0,

        @cnt int = 1000000,

        @dt datetime = '2000-01-01'

-- Populate tblDateDT with one million rows

while @i < @cnt

begin

   insert tblDateDT (dtDate) values ( dateadd(day, @i, @dt) )

   select @i = @i + 1

end       

go

Create function dbo.UDF_GetLongDateString (@dtDate datetime)

returns varchar(1000)

as

begin

  

   return convert(varchar(1000), @dtDate, 109)

end;

Next, select the following performance counters and start a trace:

Object

Counter

Processor

% Processor Time

SQLServer:Access Methods

Index Searches/sec

SQLServer:Access Methods

Page Splits/sec

SQLServer:Transactions

Free Space in tempdb (KB)

Baseline from Perf Counter Perspective:

Let's first establish a "healthy" baseline without CPU and tempDB anomalies. Run this simple select statement:

 

declare @temp table (DateID int, strDate varchar(1000), dtDate datetime)

insert @temp(DateID, strDate, dtDate)

select dt.DateID, convert(varchar(1000), dt.dtDate, 109), dt.dtDate from dbo.tblDateDT dt

 

select top 1 * from @temp

 

Here's the @@version of my test box: Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64)   Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6001: Service Pack 1).

  

On my quad proc, 8G workstation, it took just one second to run. Pause perfmon and you should see a graph like below.

 

 

Figure 1: A baseline showing a CPU uptick and a tempDB free space dip

 

As you can see, we have a slight CPU spike while in the meantime the free tempDB space dips a bit for populating the temp variable.

 

Baseline from SQL Internal Perspective:

Now let's get some baseline from SQL internal perspective. We will turn on statistics time, IO and profile options to see the duration, temp objects and execution plan. Run the same query but add the three extra statistics options

set statistics time on

set statistics profile on

set statistics io on

go

declare @temp table (DateID int, strDate varchar(1000), dtDate datetime)

insert @temp(DateID, strDate, dtDate)

select

dt.DateID, convert(varchar(1000), dt.dtDate, 109), dt.dtDate from dbo.tblDateDT dt

select top 1 * from @temp

go

set statistics time off

set statistics profile off

set statistics io off

And here is the output: ( The output is truncated to the right so you may want to blow up your IE window if lines don't align well.)

Table 'tblDateDT'. Scan count 1, logical reads 2609, physical reads 0, read-ahead reads 0, lob logical reads 0

(1000000 row(s) affected)

Rows Executes StmtText

-------------------- -------------------- --------------------------------------------------------------------

1000000 1 insert @temp(DateID, strDate, dtDate)

select dt.DateID, convert(varchar(1000), dt.dtDate, 109), dt.dtDate from dbo.tblDateDT dt

1000000 1 |--Table Insert(OBJECT:(@temp), SET:([DateID] = [DemoDB].[dbo].[tb

1000000 1 |--Top(ROWCOUNT est 0)

0 0 |--Compute Scalar(DEFINE:([Expr1006]=CONVERT(varchar(100

1000000 1 |--Clustered Index Scan(OBJECT:([DemoDB].[dbo].[tbl

(5 row(s) affected)

SQL Server Execution Times:

CPU time = 1014 ms, elapsed time = 1100 ms.

DateID strDate

----------- --------------------------------------------------------------------------------------------------

1 Jan 1 2000 12:00:00:000AM

(1 row(s) affected)

Table '#2FCF1A8A'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, l

Rows Executes StmtText StmtId NodeId Parent

-------------------- -------------------- ------------------------------------- ----------- ----------- ------

1 1 select top 1 * from @temp 2 1 0

1 1 |--Top(TOP EXPRESSION:((1))) 2 2 1

1 1 |--Table Scan(OBJECT:(@temp)) 2 3 2

(3 row(s) affected)

There is nothing unusual here. It took one second to scan the tblDateDT and then put it into temp table variable, #2FCF1A8A. The execution plan shows that there just one execution in each node and the data is kind of "streaming" into @temp. The inline Convert function is handled, well, "inlined". Later, when we replace the inline Convert function with the scalar UDF UDF_GetLongDateString, we will see the a change of the plan as well as how SQL handles the data flow.

Use Scalar UDF to Insert Temp Table Variable:

The following script has the inline Convert function replaced by the scalar UDF, dbo.UDF_GetLongDateString.

declare @temp table (DateID int, strDate varchar(1000), dtDate datetime)

insert @temp(DateID, strDate, dtDate)

select dt.DateID, dbo.UDF_GetLongDateString(dt.dtDate ) , dt.dtDate from dbo.tblDateDT dt

select top 1 * from @temp

Perf Counter Analysis:

Resume the perfmon counter trace and run the script. Upon completion, pause the perfmon trace and you should see something similar like below.

Figure 2: Compared to the baseline in the left, the center is the scalar UDF anomalies of extra tempDB usage, page splits and index searches.

The duration is a long 12 seconds on my box, 10 times longer than our base line. CPU usage spikes up from the get go and stays that way for the rest of the batch execution. Ignore the tiny spike two thirds of the course as it's backgound noise. The page splits and index searches counters not seen in the baseline stay plateaued and trend together with CPU usage. Lastly, the tempDB free space slowly chips off and has a sudden drop at the end. The space is reclaimed in two steps, at the end of the batch and about 3 seconds afterwards.

While these are interesting trends but how do they correlate with each other? Stay with me, they will make more sense once we look into SQL internal.

SQL Internal Analysis:

Let's run the same script with three statistics options on

set statistics time on

set statistics profile on

set statistics io on

go

declare @temp table (DateID int, strDate varchar(1000), dtDate datetime)

insert @temp(DateID, strDate, dtDate)

select dt.DateID, dbo.UDF_GetLongDateString(dt.dtDate), dt.dtDate from dbo.tblDateDT dt

select top 1 * from @temp

go

set statistics time off

set statistics profile off

set statistics io off

Output:

Table 'Worktable'. Scan count 1, logical reads 2974753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical r

Table 'tblDateDT'. Scan count 1, logical reads 2609, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical read

(1000000 row(s) affected)

Rows Executes StmtText

-------------------- -------------------- ---------------------------------------------------------------------------------------

1000000 1 insert @temp(DateID, strDate, dtDate)

select dt.DateID, dbo.UDF_GetLongDateString(dt.dtDate), dt.dtDate from dbo.tblDateDT dt

1000000 1 |--Table Insert(OBJECT:(@temp), SET:([DateID] = [DemoDB].[dbo].[tblDateDT].[DateID] a

1000000 1 |--Table Spool

1000000 1 |--Compute Scalar(DEFINE:([Expr1006]=[DemoDB].[dbo].[UDF_GetLongDateString]

1000000 1 |--Top(ROWCOUNT est 0)

1000000 1 |--Clustered Index Scan(OBJECT:([DemoDB].[dbo].[tblDateDT].[PK__t

(6 row(s) affected)

SQL Server Execution Times:

CPU time = 13557 ms, elapsed time = 14208 ms.

DateID strDate

----------- ---------------------------------------------------------------------------------------------------------------------

793793 Apr 30 4173 12:00:00:000AM

(1 row(s) affected)

Table '#6166761E'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0

Rows Executes StmtText StmtId NodeId Parent PhysicalOp

-------------------- -------------------- ------------------------------------- ----------- ----------- ----------- -------------

1 1 select top 1 * from @temp 2 1 0 NULL

1 1 |--Top(TOP EXPRESSION:((1))) 2 2 1 Top

1 1 |--Table Scan(OBJECT:(@temp)) 2 3 2 Table Scan

Using the scalar UDF in the insert @temp statement introduces an extra "Worktable" table. It is created with the strDate column left blank. The scalar function then updates the nvarchar column with an about 26 character wide value (for example, Apr 30 4173 12:00:00:000AM). There you have the classical page splits. What I also experimented is the make the nvarchar column narrower and I can see a lower page splits plateau. However, as I tried different page splits rates, they don't affect CPU usage much. So where is CPU usage coming from? It's the index searches.

Notice that there are close to 3,000,000 logical reads against the "Worktable". That's 3 times the row size. If this number sounds familiar, you're not mistaken. It's a bookmark lookup. When the scalar UDF needs to update the "Worktable", it performs a bookmark lookup to get to the row. The number of reads is not exactly 3 times since it contains reads when the BTree is only two level deep. Remember we have a constant page split causing "Worktable" to gradually grow into a big size. This also explains why we have a slow but constant consumption of tempDB free space. It's the scalar UDF, row by row, updating the varchar column and expanding the "Worktable". When the UDF is done with the last row, it jams all the complete resultset into the temp table variable and that causes the sudden dip in tempDB at the end of the batch.

Summary:

In a nutshell, when INSERT INTO @temp or #temp ... followed by SELECT scalar_UDF()... syntax is used, an extra worktable is used to pass data to the UDF, which in turns, tries to find one row at a time to process (hence the index searches caused by repetitive bookmark lookups) and updates a column in the worktable (hence the page splits and tempDB usage growth).

Just like in the case of temp table variable VS temp table, you should know the pros and cons of different technologies in order to implement the features that can help you to achieve your business need. This blog serves as an informational purpose and hope with the better understanding of the behaviors I just presented, you will use scalar UDF in the scenario that's best suited for you.

Happy coding,

Mike Chang

 

SQL SDE,

MSDN

Comments

  • Anonymous
    July 07, 2009
    I am able to reproduce the same thing with SQL 2005. Microsoft SQL Server 2005 - 9.00.3282.00 (Intel X86)   Aug  5 2008 01:01:05   Copyright (c) 1988-2005 Microsoft Corporation  Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

  • Anonymous
    July 29, 2009
    is it 3 times, or 1000 times of row size?  udf's are bad as they turn set based operations to row based.

  • Anonymous
    April 06, 2010
    Rafat, you probably refer to this section of the blog:  'Notice that there are close to 3,000,000 logical reads against the "Worktable".  That's 3 times the row size. ' What I compare to is the row size of the "Worktable" which got inserted 1-1 from tblDateDT where I make an initial population of 1,000,000 rows.