SQL Server: Variables, Parameters or Literals? Or... Constants?
The age-old question
"Should I use Variables, Parameters, or Literals in my code?" We've all seen the pros and cons for each of them and had to choose the option with the least negative impact. What if I told you there was a fourth option?
For years, I wished SQL Server would support constants, a value that you could declare at the database level, that would allow the best query plan and still be simple to change. Sadly, I can't say that they're being introduced (not that I'm aware of). But with a little work, we can duplicate their functionality. For lack of a better term, I'll be referring to it as a Pseudo-Constant.
Using AdventureWorks, consider a scenario where the Shipping department need a special report for all "Overnight Shipping" orders. We'd typically choose one of three solutions for for implementing this query:
Using Variables
DECLARE @ShipMethodID INT
SELECT @ShipMethodID = m.ShipMethodID
FROM Purchasing.ShipMethod m
WHERE m.Name = 'OVERNIGHT J-FAST'
SELECT *
FROM Sales.SalesOrderHeader h
WHERE h.ShipMethodID = @ShipMethodID
Using Literals
SELECT *
FROM Sales.SalesOrderHeader h
WHERE h.ShipMethodID = 4 -- OVERNIGHT J-FAST
Using Stored Procedure Parameters
CREATE PROCEDURE GetShippingData(@ShipMethodID INT)
AS
SELECT * FROM Sales.SalesOrderHeader h
WHERE ShipMethodID = @ShipMethodID
While a constant would rarely replace the need for parameters, it's worth mentioning their behavior for the sake of comparison. Through a feature called, "parameter sniffing", parameters give the query optimizer the best value for your first compiled execution but can often cause issues with subsequent calls that don't represent data of a similar cardinality. Among other places, parameter sniffing is discussed here and here.
Using Pseudo-Constants
Like a real constant, the pseudo-constant doesn't replace the need for parameters or variables. Instead, it's something that you might consider when you want the performance of a literal without the overhead maintenance of updating hundreds of procedures when a change is required.
Implementation:
Pseudo-Constants are not variables or parameters. Instead, they're simply views with one row, and enough columns to support your constants. With these simple rules, the SQL Engine completely ignores the value of the view but still builds an execution plan based on its value. The execution plan doesn't even show a join to the view!
To show its true power, we'll have to add some data to AdventureWorks database. I'm using the AdventureWorks2012 download from CodePlex and updating the SalesOrderHeader table to include a wider variety of data.
Data and Index Changes
[Sales].[SalesOrderHeader] currenly uses two of the five ship methods, which doesn't give us enough uniqueness of data. We're also missing an index on ShipMethodID so we'll make a few adjustments:
-- Change 1000 "XRQ - TRUCK GROUND" rows to "ZY - EXPRESS"
UPDATE h SET h.ShipMethodID = 2
FROM Sales.SalesOrderHeader h
WHERE h.SalesOrderID IN(SELECT TOP 1000 SalesOrderID
FROM Sales.SalesOrderHeader
WHERE ShipMethodID = 1 ORDER BY newid())
-- Change 100 "XRQ - TRUCK GROUND" rows to "OVERSEAS - DELUXE"
UPDATE h SET h.ShipMethodID = 3
FROM Sales.SalesOrderHeader h
WHERE h.SalesOrderID IN(SELECT TOP 100 SalesOrderID
FROM Sales.SalesOrderHeader
WHERE ShipMethodID = 1 ORDER BY newid())
-- Change 10 "XRQ - TRUCK GROUND" rows to "OVERNIGHT J-FAST"
UPDATE h SET h.ShipMethodID = 4
FROM Sales.SalesOrderHeader h
WHERE h.SalesOrderID IN(SELECT TOP 10 SalesOrderID
FROM Sales.SalesOrderHeader
WHERE ShipMethodID = 1 ORDER BY newid())
-- Add an index to ShipMethodId
CREATE INDEX ix_Sales_SalesOrderHeader_ShipMethodId ON Sales.SalesOrderHeader(ShipMethodID)
Pseudo-Constant creation
Our next step is to generate our Pseudo-Constants. As described above, we will create a table for each constant we require in our code. We'll also create a schema to keep things organized. This is not required but I find it improves readability. The data we're creating reflects the data in [Purchasing].[ShipMethod]
ShipMethodID | Name |
---|---|
1 | XRQ - TRUCK GROUND |
2 | ZY - EXPRESS |
3 | OVERSEAS - DELUXE |
4 | OVERNIGHT J-FAST |
5 | CARGO TRANSPORT 5 |
CREATE SCHEMA ShipMethod
GO
-- Each view can only have one row.
-- Create one column for each desired constant.
-- Each column is restricted to a single value.
CREATE VIEW ShipMethod.ShipMethodID AS
SELECT CAST(1 AS INT) AS [XRQ - TRUCK GROUND]
,CAST(2 AS INT) AS [ZY - EXPRESS]
,CAST(3 AS INT) AS [OVERSEAS - DELUXE]
,CAST(4 AS INT) AS [OVERNIGHT J-FAST]
,CAST(5 AS INT) AS [CARGO TRANSPORT 5]
Testing Our Pseudo-Constants
Now that our view is in place, we can compare the overhead of a pseudo-constant with that of a variable and a literal. We'll start by turning on STATISTICS IO and then running the most selective query (OVERNIGHT J-FAST) against each implementation.
SET NOCOUNT ON
SET STATISTICSIO ON
-- Execute as a variable
DECLARE @ShipMethodID INT
SELECT @ShipMethodID = m.ShipMethodID
FROM Purchasing.ShipMethod m
WHERE m.Name ='OVERNIGHT J-FAST'
SELECT*
FROM Sales.SalesOrderHeader h
WHERE h.ShipMethodID = @ShipMethodID
Table 'ShipMethod'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 693, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Execute As JOIN
SELECT h.*
FROM Sales.SalesOrderHeader h
JOIN Purchasing.ShipMethod sm
ON h.ShipMethodID = sm.ShipMethodID
WHERE sm.Name ='OVERNIGHT J-FAST'
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ShipMethod'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 693, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Execute as a literal
SELECT * FROM Sales.SalesOrderHeader h
WHERE ShipMethodID = 4 -- OVERNIGHT J-FAST
Table 'SalesOrderHeader'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SELECT h.*
FROM Sales.SalesOrderHeader h
JOIN ShipMethod.ShipMethodID const
ON h.ShipMethodID = const.[OVERNIGHT J-FAST]
Table 'SalesOrderHeader'. Scan count 1, logical reads 32, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
First Observations
There are two interesting things we see from STATISTICS IO output.
1. While the use of a standard JOIN or a variable show reads from the ShipMethod table, there are no observable reads from using the view.
2. Compared to literal and Pseudo-Constant, the logical reads against SalesOrderHeader are significantly higher with a standard JOIN or a variable.
Execution Plans
We can see here, that using a variable results in a clustered index scan. Since the value of @ShipMethodID is not known at compile time, the SQL engine has estimated that 104488.3 rows (one third of the total 31465 rows) will be returned. There are a lot of articles that describe this behavior including a great one by Kalen Delaney so I'm not going to cover it again here. I'll only mention that this is expected behavior when using variables in a query.
More interestingly, using Pseudo-Constants gives us the exact same execution plan as using a literal. The estimated and actual number of rows are a match because we know the value of ShipMethodID at compile-time.
Looking at the Index Seek operation, we even see that SQL Server is using the literal value 4 to determine the execution plan.
But Why Bother? Why not use Literals?
So why not just use literal values? On the surface, it seems like Pseudo-Constants are a lot of work with no additional benefit, but we get some terrific advantages in medium-to-large projects:
1. Easier Dependency Tracking.
When using variables, constants, or even JOINing to tables, we're never sure which lookup values are actually used. The removal of a row from the ShippingMethod table won't cause any compile errors in our code and will only present a problem when a query is actually executed. We may or may not even notice the problem. Using Pseudo-Constants gives us immediate visibility into dependencies:
CREATE PROCEDURE Get_OvernightShippingOrders
AS
SELECT h.* FROM Sales.SalesOrderHeader h
JOIN ShipMethod.ShipMethodID const
ON h.ShipMethodID = const.[OVERNIGHT J-FAST]
GO
-- View all objects that use this ShipMethod constants:
SELECT OBJECT_NAME(dep.object_id) AS ReferencingObject, c.name AS ReferencedColumnName
FROM sys.sql_dependencies dep
JOIN sys.columns c
ON dep.referenced_major_id = c.object_id
AND dep.referenced_minor_id = c.column_id
WHERE dep.referenced_major_id = OBJECT_ID('ShipMethod.ShipMethodID')
ReferencingObject | ReferencedColumnName |
---|---|
Get_OvernightShippingOrders | OVERNIGHT J-FAST |
2. Easier Code Refactoring.
When using a literal, updating an ID becomes a painful search-and-replace project. Changing every stored procedure that used "ShipMethodID = 4" to a new value like, "ShipMethodID = 6" requires a manual review of the code to look for these hard-coded values. In a large enough project, you'll never be sure you got them.
If you're using Pseudo-Constants, it's a simple process of querying the dependencies (as outlined aboce) or altering the underlying view to provide a new literal value.
3. The right data type every time.
From my own experience of seeing unexpected implicit data conversions, I know our literals and variables can sometimes use the wrong data type. This results in undetected performance hits when the SQL engine has to convert every row in the table to match the data type of your parameter. The Pseudo-Constant ensures you're only setting the data type in one place.
4. Instant Recompiles for data changes.
If the value of "OVERNIGHT J-FAST" changes, you're required to ALTER the view. This will result in all dependent objects to obtain a new plan for their next execution. While it's rarely valuable in this situation, it's extremely beneficial if you use Pseudo-Constants for date ranges.
Other Uses
1. Working in date ranges
Another exciting use of Pseudo-Constants is when working with date ranges. Pseudo-Constants can be used to supplement variables in returning a more selective dataset. I'll expand on this in a future post but imagine a query where the Psedo-Constant narrowed down your potential rows by 90% while the variable was used to get the exact range. This is far beneficial than a plan that returns all of your rows to be filtered at runtime.
In closing
Disclaimer and a heartfelt thanks: Whenever possible, I will endeavor to give credit to people that provided a solution, idea, or inspiration for my own blog posts. These posts would not have possible without all the amazing people in the SQL Server community sharing their own solutions. While I'm not aware of anybody using this solution, I don't claim to be the first person to have thought of it. If there are any other sources for anything I cover, please contact me so that I may give proper attribution.
Comments
- Anonymous
December 20, 2013
Very cool! I started using immediately ALALTER VIEW [dbo].[Constants] AS SELECT CAST(1 AS BIT) AS [True] , CAST(0 AS BIT) AS [False] , CAST(1 AS BIT) AS [On] , CAST(0 AS BIT) AS [Off] , CAST(1 AS BIT) AS [Yes] , CAST(0 AS BIT) AS [No] , CAST(NULL AS BIT) AS [AssumeOn] , CAST(NULL AS BIT) AS [AssumeOff] GO select * from [Infrastructure].[Consolidated.Servers (Various)] join Constants on [Is VM?] = [On] This lets me get rid of magic 1s and 0s. I'm experimenting with changing the ANSI comparison rules.