Table variable vs. temp table and estimated vs. actual query plan
The other day we got a call from a puzzled friend. He has rewritten a stored procedure using table variable instead of temp table, it makes the stored procedure code look more tidy. The puzzling part is the same stored procedure now running a lot slower. The estimated plans looked the same using temp table and table variable, but the execution plans are very different. Unbeknownst to him, rewriting the procedure using table variable bumped into a query optimizer blind spot. While the “Estimated” and “Actual” query plans are same in most cases, there are some exception. If a table (temporary or permanent) is created in the same batch with the query, SQL Serve has to recompile the query since the query definition is unknown when the batch is compiled the first time. This is not the case for table variables (and that was the main reasons SQL Server 2005 introduce them – to reduce recompilations when unnecessary). In order to explain in detail, let’s use the following example:
--0. create test data
use test
if (OBJECT_ID('Test1') is not null)
DROP TABLE Test1
go
CREATE TABLE Test1 (ID int)
DECLARE @i int
SET @i = 0
SET NOCOUNT ON
WHILE @i < 20000
BEGIN
INSERT INTO Test1 (ID) Values (@i)
SET @i = @i + 1
END
CREATE CLUSTERED INDEX IX_Test1 ON dbo.Test1 (ID)
--1. Query using table variable
DECLARE @Tmp1 TABLE (ID int)
INSERT INTO @Tmp1(ID)
SELECT ID
FROM Test1
SELECT *
FROM Test1
WHERE ID NOT IN (SELECT ID FROM @Tmp1)
--2. Query using temp table
CREATE TABLE #Tmp1(ID int)
INSERT INTO #Tmp1(ID)
SELECT ID
FROM Test1
SELECT *
FROM Test1
WHERE ID NOT IN (SELECT ID FROM #Tmp1)
DROP TABLE #Tmp1
The estimated plans for both SELECT queries in the example above look the same – Nested Loops (Left Anti Semi Join) with the outer (top in the graphical display) table the Clustered index on Test1 and the inner table is the table variable in the first case and the temporary table in the second case. But if you run both scripts you will see that the first runs considerably slower – a minute or two depending on your processor speed, memory available to SQL Server and disk I/O capacity. How is this possible?
When the queries are compiled neither the table variable nor temporary table are populated and the query optimizer assumes “minimum” number of rows and that is always 1 in SQL Server. This estimation is used when SQL Server generates the “Estimated” query plan. While the “Estimated” and “Actual” query plans are same in most cases, there are some exception. If a table (temporary or permanent) is created in the same batch with the query, SQL Serve has to recompile the query since the query definition is unknown when the batch is compiled the first time. This is not the case for table variables.
Now turn on “Display Actual Execution Plan” in SSMS and re-run the scripts 1. in one query window and 2. in another one. You will see that the actual plan for script 1 is the same as estimated plan and took substantially longer than the script 2 to complete. The actual execution plan for Script 2 is very different from its estimated plan. The actual execution plan for script 2 uses Hash Match to execute the Anti Semi Join at the top of the plan. There is a table scan of the temporary table #Tmp1 at the top of the plan building the hash table. If you examine the properties of this scan you will find out that the number of Estimated rows is 20000 (was 1 in the Estimated plan as well as it is 1 for the table variable plan).
Figure 1. Estimated plan
Figure 2. Actual plan of SELECT query using temp table
When examining query plans be careful to consider possible plan change during the batch execution due to recompiles. Bear in mind that while table variables are not causing recompiles, you may need just the opposite to get the best query plan. You should always use temporary tables in your queries unless you intentionally want to avoid query recompilation and you are confident query variable will give you a good plan. Introduction of statement level recompilation in SQL Server 2005 substantially lowered the need to use table variables in your queries.
You will find some great information about table variables and temporary tables in
https://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html .
Please note that the actual execution plans for the above batches will change if you merge 1 and 2 into to a single batch and will change again if you merge all three into single batch… There are some interesting learnings surrounding such modifications and we will talk about them in some later blog entries! Stay tuned!
Lubor Kollar, Lindsey Allen
Comments
Anonymous
October 09, 2008
PingBack from http://www.easycoded.com/table-variable-vs-temp-table-and-estimated-vs-actual-query-plan/Anonymous
October 12, 2008
I have been cleaning up query performance issues related to table variables quite regularly in the past 18 months or so. I am finding UDFs are an even worse perf dog.Anonymous
January 11, 2011
"... This is not the case for table variables (and that was the main reasons SQL Server 2005 introduce them – to reduce recompilations when unnecessary) ,,," A minor correction: Table variables were actually introduced in SQL Server 2000. msdn.microsoft.com/.../aa260638(SQL.80).aspxAnonymous
November 05, 2011
When using Table variables you cannot use Indexes / Statistics or Hints with them.