Performance implications of using multi-Statement TVFs with optional parameters

Authored by Arvind Shyamsundar (Microsoft)
Credits: Prasad Wagle, Srinivasa Babu Valluri, Arun Jayapal, Ranga Bondada, Anand Joseph (members of the Sayint by Zen3 team)
Reviewers: Rajesh Setlem, Joe Sack, Dimitri Furman, Denzil Ribeiro (Microsoft)

This blog post was inspired our recent work with the Sayint dev team, who are a part of Zen3 Infosolutions. SQLCAT has been working with them in their process of adopting SQL Server 2017. During a recent lab engagement, we found a set of seemingly innocuous query (anti-)patterns which were actually having a significant impact on overall performance. In this blog post, we show you what these anti-patterns are and how to remedy them.

Background

Table valued functions (TVFs) are a popular way for T-SQL developers to abstract their queries into reusable objects. There are two types of TVFs: inline and multi-statement. Broadly speaking, multi-statement TVFs offer more capabilities but come with a cost. That’s what we will look at in this post. Do note that we have previously alluded to some of these complications: our SQLCAT Guide to the Relational Engine has a classic article “Table-Valued Functions and tempdb Contention” which has a lot of detail and background - so, it is highly recommended that you read that article first! (Download the PDF linked in that older blog post and go to page 115 in the PDF).

What we saw

During the lab engagement, we were stress-testing a query which basically selected all the rows from a TVF. For reproduction purposes we have sample code below which works with the WideWorldImporters sample database. Here’s the code for the TVF which is equivalent to the initial state of the TVF initially implemented by the customer:

[code lang="sql"]
CREATE FUNCTION MSTVF
(@startOrderID INT=NULL, @endOrderID INT=NULL)
RETURNS
@result TABLE (
OrderID INT,
CustomerID INT)
AS
BEGIN
INSERT @result
SELECT OrderID,
CustomerID
FROM Sales.Orders AS Ord
WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, Ord.OrderID) AND ISNULL(@endOrderID, Ord.OrderID);
RETURN;
END

The actual call from the application is made via a stored procedure as defined below:

[code lang="sql"]
CREATE OR ALTER PROCEDURE getOrders(@startOrderID INT, @endOrderID INT)
AS
SELECT *
FROM dbo.MSTVF(@startOrderID, @endOrderID);

And here is a sample procedure call with actual parameters:

[code lang="sql"]
EXEC getOrders 1, 100

If we review the execution plan (we used SET SHOWPLAN_ALL ON for convenience to obtain the below output) for the above, you will notice a couple of interesting things. Let's take a look at the plan first:

[code]
EXEC getOrders 1, 100
CREATE PROCEDURE getOrders(@startOrderID INT, @endOrderID INT) AS SELECT * FROM dbo.MSTVF(@startOrderID, @endOrderID);
|--Sequence
|--Table-valued function(OBJECT:([WideWorldImporters].[dbo].[MSTVF]))
|--Table Scan(OBJECT:([WideWorldImporters].[dbo].[MSTVF]))
UDF: [WideWorldImporters].[dbo].[MSTVF]
CREATE FUNCTION MSTVF (@startOrderID INT=NULL, @endOrderID INT=NULL) RETURNS @result TABLE ( OrderID INT, CustomerID INT) AS BEGIN INSERT @result SELECT OrderID, CustomerID FROM Sales.Orders AS Ord WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, Ord.OrderID) AND ISNULL(@endOrderID, Ord.OrderID) OPTION (RECOMPILE);
|--Table Insert(OBJECT:([WideWorldImporters].[dbo].[MSTVF]), SET:([OrderID] = [WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID],[CustomerID] = [WideWorldImporters].[Sales].[Orders].[CustomerID] as [Ord].[CustomerID]))
|--Index Scan(OBJECT:([WideWorldImporters].[Sales].[Orders].[FK_Sales_Orders_CustomerID] AS [Ord]))

From the above showplan, notice the following:

  • The query is scanning the Sales.Orders table (notice the Index Scan with the WHERE predicate) even though there is a clustered primary key on the OrderID column which could potentially have been used to seek to the range of records being accessed;
  • There is a Table Insert operation though we are not really inserting anything from our query above!

In fact, if you turn on STATISTICS IO and actually execute the query, you will notice the below:

[code]
Table '#A25547B1'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Notice this is a 'temp' table! Keep this in mind, we tie back to this later in the post. We then used the OStress utility from the RML Utilities package to simulate a number of connections all querying the TVF at the same time. Here is the command-line we used:

[code lang="bash"]
ostress -S.\SQL2017 -dWideWorldImporters -q -Q"EXEC getOrders 1, 100" -n100 -r100

The above command line simulates 100 users, and repeats the SELECT query 100 times for each user. This query takes around 16.5 seconds to complete on a laptop with an SSD and an i7 CPU. Of great interest to us was the fact that TEMPDB was being used (we found this by using monitoring queries like the ones specified in our older blog post mentioned at the start of this article.)

Fix #1: Parameter Embedding

If you notice carefully, the above query is an example of ‘optional parameters’ wherein the same query caters to situations where there are specific values for the parameters as well as other cases where there are none. Due to the implementation of the query (specifically the usage of ISNULL(@paramname, ColName)) what ends up happening is that the query plan thus generated will not leverage any indexes on the table. While this query can be refactored to separate versions for cases where the parameter values are supplied, and where they are not, another viable option is to use OPTION (RECOMPILE) on the statement level. This is an acceptable solution in most cases because the cost of scanning the table is often far higher than the cost of recompiling this query. So here is how we used OPTION RECOMPILE in this case:

[code lang="sql"]
ALTER FUNCTION MSTVF
(@startOrderID INT=NULL, @endOrderID INT=NULL)
RETURNS
@result TABLE (
OrderID INT,
CustomerID INT)
AS
BEGIN
INSERT @result
SELECT OrderID,
CustomerID
FROM Sales.Orders AS Ord
WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, Ord.OrderID) AND ISNULL(@endOrderID, Ord.OrderID)
OPTION (RECOMPILE);
RETURN;
END

We leave the rest of the code (the wrapper stored procedure and the actual parameters for the stored procedure call) unchanged. Here is the execution plan for the revised stored procedure / TVF:

[code]
EXEC getOrders 1, 100
CREATE PROCEDURE getOrders(@startOrderID INT, @endOrderID INT) AS SELECT * FROM dbo.MSTVF(@startOrderID, @endOrderID);
|--Sequence
|--Table-valued function(OBJECT:([WideWorldImporters].[dbo].[MSTVF]))
|--Table Scan(OBJECT:([WideWorldImporters].[dbo].[MSTVF]))
UDF: [WideWorldImporters].[dbo].[MSTVF]
CREATE FUNCTION MSTVF (@startOrderID INT=NULL, @endOrderID INT=NULL) RETURNS @result TABLE ( OrderID INT, CustomerID INT) AS BEGIN INSERT @result SELECT OrderID, CustomerID FROM Sales.Orders AS Ord WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, Ord.OrderID) AND ISNULL(@endOrderID, Ord.OrderID) OPTION (RECOMPILE);
|--Table Insert(OBJECT:([WideWorldImporters].[dbo].[MSTVF]), SET:([OrderID] = [WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID],[CustomerID] = [WideWorldImporters].[Sales].[Orders].[CustomerID] as [Ord].[CustomerID]))
|--Clustered Index Seek(OBJECT:([WideWorldImporters].[Sales].[Orders].[PK_Sales_Orders] AS [Ord]), SEEK:([Ord].[OrderID] >= (1) AND [Ord].[OrderID] <= (100)) ORDERED FORWARD)

After using the above hint, the query does not scan the table (notice the Clustered Index Seek above) any more but we still see a some TEMPDB usage and consequent overall slowness. The OStress test now completes in around 3.3 seconds, which is obviously much better, but can improve it further?

Fix #2: Inline TVFs

If you read the article linked to initially, you probably guessed the real answer to this problem, which is the fact that we have actually implemented a multi-statement TVF (which still has only 1 statement in this case!) While multi-statement TVFs are not always bad, and recently there have been specific improvements in processing such TVFs, we really don't need a multi-statement TVF here given we just have a single SELECT statement inside! Therefore, we proceeded to refactor the query as an inline TVF, and this is how the result looks:

[code lang="sql"]
CREATE FUNCTION ITVF
(@startOrderID INT=NULL, @endOrderID INT=NULL)
RETURNS TABLE
AS
RETURN
(SELECT OrderID,
CustomerID
FROM Sales.Orders AS Ord
WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, Ord.OrderID) AND ISNULL(@endOrderID, Ord.OrderID))

We alter the wrapper stored procedure as below:

[code lang="sql"]
CREATE OR ALTER PROCEDURE getOrders(@startOrderID INT, @endOrderID INT)
AS
SELECT *
FROM dbo.ITVF(@startOrderID, @endOrderID)

Let's look at the execution plan for the same set of parameters now:

[code]
EXEC getOrders 1, 100
CREATE PROCEDURE getOrders(@startOrderID INT, @endOrderID INT) AS SELECT * FROM dbo.ITVF(@startOrderID, @endOrderID)
|--Index Scan(OBJECT:([WideWorldImporters].[Sales].[Orders].[FK_Sales_Orders_CustomerID] AS [Ord]), WHERE:([WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID]>=isnull([@startOrderID],[WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID]) AND [WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID]<=isnull([@endOrderID],[WideWorldImporters].[Sales].[Orders].[OrderID] as [Ord].[OrderID])))

We then used an equivalent OStress command line to stress test this new inline TVF:

[code lang="bash"]
ostress -S.\SQL2017 -dWideWorldImporters -q -Q"EXEC getOrders 1, 100" -n100 -r100

This version goes back to around 14 seconds in total. We no longer see the TEMPDB I/O though - by making the TVF as an inline one, we completely avoid having to store results in a temporary table variable, and directly ‘stream’ the results to the client. You would have probably spotted one remaining problem, which is the fact that the optional parameter anti-pattern is not actually addressed by the re-written inline TVF. This is by design, because an inline TVF cannot specify query hints. If we forced a query hint in the stored procedure and then run OStress, what do you think will happen? Let's modify the stored procedure first:

[code lang="sql"]
CREATE OR ALTER PROCEDURE getOrders(@startOrderID INT, @endOrderID INT)
AS
SELECT *
FROM dbo.ITVF(@startOrderID, @endOrderID)
OPTION (RECOMPILE)

The execution plan is below; as expected the parameter embedding optimization that the OPTION (RECOMPILE) enables, results in a Clustered Index Seek:

[code]
EXEC getOrders 1, 100
CREATE PROCEDURE getOrders(@startOrderID INT, @endOrderID INT) AS SELECT * FROM dbo.ITVF(@startOrderID, @endOrderID) OPTION (RECOMPILE)
|--Clustered Index Seek(OBJECT:([WideWorldImporters].[Sales].[Orders].[PK_Sales_Orders] AS [Ord]), SEEK:([Ord].[OrderID] >= (1) AND [Ord].[OrderID] <= (100)) ORDERED FORWARD)

Then we run the same OStress command line as before. If you try this, you will see that introducing OPTION (RECOMPILE) in the stored procedure further reduces execution time for query to around 1.8 seconds. This is because in this specific case, the cost of recompiling the statement each time was much lower than the cost of scanning the Sales.Orders table. In other cases where the cost of scanning was not as high, you may find OPTION (RECOMPILE) actually degrades performance. But that has to be checked in each case.

Side note: it is useful to note that the 'magic parameter value' of NULL has been used here to denote an 'select all' case - but we also see other 'special' values being used to represent these optional filters. In those cases instead of using ISNULL, the optional parameter is checked by using a Boolean OR operator. Also, this 'optional ' pattern has other manifestations which are listed below:

[code lang="sql"]
ColumnName = @Param OR @Param IS NULL
ColumnName = COALESCE (@Param, ColumnName)

"Fix" #3: Mandatory parameters

Depending on your perspective, this last iteration can be taken as a true fix or just a test. For this, let’s write a version of the inline TVF which has all parameters mandatory, so that means we no longer need the ISNULL handling in the TVF.

[code lang="sql"]
CREATE FUNCTION ITVF_MandatoryParams
(@startOrderID INT, @endOrderID INT)
RETURNS TABLE
AS
RETURN
(SELECT OrderID,
CustomerID
FROM Sales.Orders AS Ord
WHERE Ord.OrderID BETWEEN @startOrderID AND @endOrderID)

Let's make a corresponding stored procedure as well:

[code lang="sql"]
CREATE OR ALTER PROCEDURE getOrders_MandatoryParams(@startOrderID INT, @endOrderID INT)
AS
SELECT *
FROM dbo.ITVF_MandatoryParams(@startOrderID, @endOrderID)

Here is the execution plan for this version with mandatory params:

[code]
EXEC getOrders_MandatoryParams 1, 100
CREATE PROCEDURE getOrders_MandatoryParams(@startOrderID INT, @endOrderID INT) AS SELECT * FROM dbo.ITVF_MandatoryParams(@startOrderID, @endOrderID)
|--Clustered Index Seek(OBJECT:([WideWorldImporters].[Sales].[Orders].[PK_Sales_Orders] AS [Ord]), SEEK:([Ord].[OrderID] &amp;gt;= [@startOrderID] AND [Ord].[OrderID] &amp;lt;= [@endOrderID]) ORDERED FORWARD)

And then we repeat the OStress test with the same parameters but calling the new ITVF above:

[code lang="bash"]ostress -S.\SQL2017 -dWideWorldImporters -q -Q"EXEC getOrders_MandatoryParams 1, 100" -n100 -r100

This version performs the best (as expected) (805 milliseconds). However, refactoring code into special versions for mandatory parameters is not always possible, which is why the OPTION (RECOMPILE) hint is in many cases the most viable option for a lot of customers.

Summary

Here are the key points we discussed above:

  • Multi-statement TVFs use table variables to store the result; thereby causing TEMPDB contention at scale
  • Inline TVFs do not need such temporary storage and scale much better
  • Beware of optional parameters in your TVFs. These can compound your performance problems. Whenever possible, replace TVFs which use optional parameters with specialized versions (with mandatory parameter values) for each case
  • OPTION (RECOMPILE) can be very handy to deal with the optional parameter problem, but that comes with its own cost. Be sure to test thoroughly at scale before finalizing the code

The OStress test results are also summarized below:

Scenario Total test time
Multi-statement TVF with optional parameters 16.493 seconds
Multi-statement TVF with OPTION(RECOMPILE) 3.093 seconds
Inline TVF with optional parameters 14.707 seconds
Inline TVF with OPTION(RECOMPILE) at query level 1.803 seconds
Inline TVF with mandatory parameters 805 millisec

OStress results with various TVF types and query hints

We want to hear from you!

In closing, the SQL team is eager to hear from you if you would like some improvements in scenarios involving optional parameters. Do you have a lot of such cases? Is OPTION (RECOMPILE) something which is acceptable for you? If not, what would you like to see in SQL Server and Azure SQL DB to help you deal with these optional parameter cases? Do let us know by leaving a comment!

Comments

  • Anonymous
    October 17, 2017
    The comment has been removed
    • Anonymous
      October 20, 2017
      Thanks Dmitry for your feedback.
  • Anonymous
    October 20, 2017
    How about:WHERE Ord.OrderID BETWEEN ISNULL(@startOrderID, CAST(0x80000000 AS int)) AND ISNULL(@endOrderID, CAST(0x7FFFFFFF as int))i.e. use the min and max values of the data type, like system-versioned temporal tables use 9999-12-31. It seems this should provide an efficient index seek without requiring changes in callers. OTOH, this kind of query might cause bugs later if the column is ever changed to bigint.
    • Anonymous
      October 20, 2017
      Thank you Ranta; indeed this might work for integer datatypes, but will not be feasible for strings and broader set of filters. Good point though!