Unintended Consequences of Scalar-Valued User Defined Functions
Author: Chuck Heinzelman
Reviewers: Kevin Cox, Dan Jones, Lara Rubbelke
During a customer engagement, we noticed a query that was taking a long time to execute. We saw that the query had a scalar-valued User Defined Function (UDF) encapsulating functionality in the SELECT clause. To understand why this could be a problem, you need to understand how SQL Server deals with functions in the SELECT clause. Let’s take a quick dive into how this works in SQL Server.
Assume that I have a table with the following structure:
CREATE TABLE dbo.Numbers
(
Number integer NOT NULL PRIMARY KEY CLUSTERED
);
Now assume that this table has 100,000 rows in it – numbered 1 to 100,000. As an aside, I like to have a table like this with about 1,000,000 rows in it for occasions where I need to duplicate data. You never know when you will need a table of numbers!
If I execute the following SQL Statement, what will I get?
SELECT Number, GetDate() AS CurDate
FROM dbo.Numbers
WHERE Number <= 1000;
If you answered 1000 rows with the numbers 1 - 1000 and the exact same value for the CurDate column in every row, you would be correct. This is because SQL Server only executes the GetDate() function once for the entire statement.
Now assume that I create the following function:
CREATE FUNCTION dbo.GetDateFunction()
RETURNS DateTime
AS
BEGIN
RETURN GetDate();
END
If I execute the following SQL Statement, what will I get?
SELECT Number, dbo.GetDateFunction() AS CurDate
FROM dbo.Numbers
WHERE Number <= 1000;
If you said that my results would be the same as in the last example, you would be incorrect. Since we encapsulated the GetDate() function in our own user-defined function, SQL Server had to execute it for every row returned, not just once for the entire query. This knowledge is fundamental to understanding the performance of the queries that you write as well as ensuring that you get the results that you expect.
Unfortunately, our problem was not this simple. We were dealing with encryption and decryption functions. Take the following query as an example:
SELECT RowID, DecryptByKey(EncryptedColumn)
FROM dbo.TableWithManyRows;
In this case, even though the DecryptByKey function is a built-in function, it will be called once per row (each individual value needs to be decrypted). Given that, what would be the harm of having this function embedded in a user defined function like this:
CREATE FUNCTION dbo.DecryptByKeyFunction(@Data varchar(255))
RETURNS varchar(255)
AS
BEGIN
RETURN DecryptByKey(@Data);
END
And using it like this:
SELECT RowID, dbo.DecryptByKeyFunction(EncryptedColumn)
FROM dbo.TableWithManyRows;
Implementing it this way would allow you to easily swap out the encryption functionality for your entire system by changing a single function rather than changing every query that uses this logic.
The problem is that while the inline DecryptByKey function is called for each row in the query, the key is only opened once for the entire statement. If you put the DecryptByKey function in your own user defined function, the key will need to be opened for each function call rather than once for the entire query. In our scenario, this caused a huge performance issue.
Summary
Knowing the behavior of built-in and User Defined Functions is extremely important to understanding the performance and output of the queries that you write. I am not trying to tell you to never use UDFs – just be sure to test your solutions thoroughly and understand all of the “unintended consequences” of encapsulating logic inside a UDF before rolling your solution out – even to your developers.
Comments
Anonymous
June 24, 2011
Wouldn't the use of a table valued user defined function have much better performance, and retain the benefit of reusable code?Anonymous
June 26, 2011
Considering GetDate function example, I read some time ago that it's a good idea to store the GetDate, when possible, in a variable and reuse the variable in the query. This makes sense especially when a UDF is used and the result is the same across the whole result dataset, as in the above Date example. Concerning the crypto example, wouldn't be ideal if a given cryptographic key could be applied to the whole dataset, even when the crypto functions are used inside of a UDF? I haven't worked with cryptographic functions so not sure if such functionality exists or is feasible from an architectural standpoint. There are situations in which the use of a UDF in inevitable, this coming with performance degradation. If I'm not mistaking in Oracle can be created an index on a deterministic UDF. I think it would be nice to have this kind of functionality also in SQL Server.Anonymous
July 04, 2011
Ack, my comment got lost. I think. It's not clear from this post whether there really is a lot of overhead with "openeing the key" for DecryptByKey. What does that overhead involve? It's not mentioned in the DecruptByKey documentation. Of course SQL internally has to call DecrptByKey or the UDF (or both) once per record. Everything in SQL is done one record at a time under the hood (reading the value of a column from each record HAS to process one record at a time!). And it's also unfortunate that SQL can't make the inline function situation work as efficiently as the UDF situation.Anonymous
December 05, 2011
The comment has been removedAnonymous
June 29, 2013
Hi Adrian. SQL Server 2005 has computed columns (persistent or nonpersistent): msdn.microsoft.com/.../ms191250(v=sql.90).aspx And, you may define indexes on computed columns: msdn.microsoft.com/.../ms189292(v=sql.90).aspxAnonymous
July 23, 2013
What if you used a table valued function to store your crypto function. Would this not eliminate both the problem of performance (since it would still be called once) and you could change the algorithm by changing one function? I guess my question is, what are the performance implications of table valued UDFs?