Performance Benefits of using Expression over User Defined Functions
Summary
No matter how simple a UDF is, there's a large performance penalty paid when they're used. This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more. The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing. It must invoke each UDF on each row. If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows. If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing. It's the row by row processing that slows SQL Server the most. In many cases an expression can replace the functionality that the UDF provides and offer a significant performance benefit over a UDF (see examples below).
More Information
SETUP FOR THE TEST
======================
--- Create Test Table
CREATE TABLE UDF_parameters (param1 int, param2 int, param3 int, param4 int)
-- Create Clustered Index
CREATE CLUSTERED INDEX CLU1 ON UDF_parameters(Param4)
--Code for the UDF.
CREATE FUNCTION dbo.divide_func(@numerator as int, @denominator as int, @default as float)
RETURNS float
BEGIN
if @denominator = 0
RETURN @default
if @numerator = 0
RETURN @default
RETURN @numerator/@denominator
END
--- Insert records in the table ----
declare @count int
set @count =1 -----> This will insert 1 million records to the table
while @count <1000001
begin
INSERT INTO UDF_parameters values(@count+3,@count+2,@count+1, @count)
set @count = @count+1
end
Using UDF in Query SELECT list
SET STATISTICS TIME ON
GO
select PARAM1, DEVIDED_VALUE = dbo.divide_func(param2,param3,CAST(param4 as float)), param4 from UDF_parameters
GO
SET STATISTICS TIME OFF
GO
Rows Executes StmtText
------ -------- ---------------------------------------------------------------------------------------------------------------
500000 1 select PARAM1, DEVIDED_VALUE = dbo.divide_func(param2,param3,CAST(param4 as float)), param4 from UDF_parameters
500000 1 |--Compute Scalar(DEFINE:([Expr1004]=[TEST1].[dbo].[divide_func](parameter... )
500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))
If you notice the plan, we would see that the Function call is being made, for all the rows being returned by the Clustered Index Scan.
If you take a profiler trace while running this statement, you would notice the multiple executions of the UDF. (Image at the end of the Page)
Query with UDF in Where Clause
SET STATISTICS TIME ON
GO
select PARAM1,param2,param3, param4 from UDF_parameters where dbo.divide_func(param2,param3,CAST(param4 as float)) = 1
GO
SET STATISTICS TIME OFF
GO
Rows Executes StmtText
------ -------- -------------------------------------------------------------------------------------------------------------------------------------------------------------500000 1 select PARAM1,param2,param3, param4 from UDF_parameters where dbo.divide_func(param2,param3,CAST(param4 as float)) = 1
500000 1 |--Filter(WHERE:([TEST1].[dbo].[divide_func]([TEST1].[dbo].[UDF_parameters].[param2],[TEST1].[dbo].[UDF_parameters].[param3],CONVERT(float(53),[TEST1].[dbo].[UDF_parameters].[param4],0))=(1.000000000000000e+000)))
500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))
Notice a similar behaviour in multiple executions of the UDF, when we have the UDF in the where clause. (Image at the end of the Page)
Using Expression In Query
SET STATISTICS TIME ON
GO
SELECT PARAM1,
DEVIDED_VALUE =
CASE
WHEN PARAM3=0 THEN param4
WHEN param2=0 THEN param4
ELSE param2/param3
END ,
PARAM4
from UDF_parameters
GO
SET STATISTICS TIME OFF
GO
Rows Executes StmtText
------ -------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
500000 1 SELECT PARAM1,DEVIDED_VALUE = CASE WHEN PARAM3=0 THEN param4 WHEN param2=0 THEN param4 ELSE param2/param3 END ,PARAM4 from UDF_parameters
0 0 |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [TEST1].[dbo].[UDF_parameters].[param3]=(0) THEN [TEST1].[dbo].[UDF_parameters].[param4] ELSE CASE WHEN [TEST1].[dbo].[UDF_parameters].[param2]=(0) THEN [TEST1].[dbo].[UDF_parameters].[param4] ELSE [TEST1].[dbo].[UDF_parameters].[param2]/[TEST1].[dbo].[UDF_parameters].[param3] END END))
500000 1 |--Clustered Index Scan(OBJECT:([TEST1].[dbo].[UDF_parameters].[CLU1]))
Notice, how using a case expression, has only on execution.
Comparison of the time taken by the Different Statements.
Rows Returned
WITH UDF in Select List
UDF in Where Clause
With Expression
0.5 Million
26 seconds
26 seconds
4 seconds
1.0 Million
52 seconds
52 seconds
8 seconds
1.5 Million
86 seconds
87 seconds
12 seconds
Snapshots of Profiler Traces
==============================
As we can see, the query when using the function is slower than the one using the CASE expression.
In this case I was using very simple parameters, but if we have some complex parameters (like those involving mathematical operation) we would see the response time increase even more.
by
Sourabh Agarwal
Technical Lead, Microsoft SQL Server
Comments
Anonymous
October 06, 2009
Is there a performance difference bewteen calling a t-sql UDF & a sql-clr UDF for a simple non data access UDF?Anonymous
October 06, 2009
Yes, there would a slight difference in the performance of t-sql UDF & a sql-clr UDF. The reason being the multiple switches which need to be made between SQL engine and the CLR runtime. With t-sql UDF everything is within the SQL Engine and no addtional switches are required.Anonymous
January 20, 2010
Could you fix a link to the profiler image?Anonymous
May 12, 2013
Question is answered here: stackoverflow.com/.../database-function-vs-case-statement