Freigeben über


SYSK 35: Performance comparison of WITH and TABLE variable queries

As many of you know, SQL 2005 now has a common table expression (CTE) WITH.  You can think of it as a temporary (non-persistent) view defined within the scope of an executing statement (see SQL help topic ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/27cfb819-3e8d-4274-8bbe-cbbe4d9c2e23.htm for more information).

I’m sure many of you can point to a number of benefits of CTE.  The question for me was – if I can do the same with a table variable and CTE, what’s better in terms of performance? 

I’ve created a simple table with four columns – identity primary key, group id, data id, and data value.  Populated it with the data, and ran the queries (see below) that returned values of the data column when data id = 4 or 17 for a randomly selected group id.

The answer may surprise you…  My tests show that the WITH statement is about 2 times slower!   On a 1 GHz/1Mb server populated with few thousands of rows, WITH statement took approx 1.2 sec vs. 0.6 sec using the table variable.  Same query with 2.5 million rows took a whopping 49 min 59 sec using WITH statement and 23 min 33 sec using Table variable.

Query using table variables (the faster way):

DECLARE @Temp TABLE ([pk_id] [bigint] NOT NULL,
[GroupID] [bigint] NOT NULL,
[DataId] [int] NOT NULL,
[DataValue] [varchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL)

INSERT INTO @Temp
SELECT pk_id, GroupID, DataId, DataValue
FROM Table1
WHERE GroupID = @GroupID

SELECT
t1.DataValue Param4, t2.DataValue Param17
FROM @Temp t1
INNER JOIN @Temp t2 ON t1.GroupID = t2.GroupID
WHERE t1.DataId = 4 AND t2.DataId = 17

Query using WITH statement:

WITH ScenarioData AS
(
SELECT * FROM Table1 WHERE GroupID = @GroupID
)

SELECT
t1.DataValue Param4, t2.DataValue Param17
FROM ScenarioData t1
INNER JOIN ScenarioData t2 ON t1.GroupID = t2.GroupID
WHERE t1.DataId = 4 AND t2.DataId = 17

Comments

  • Anonymous
    January 06, 2006
    How does that compare with embedded SELECTs?

    SELECT
    t1.DataValue Param4, t2.DataValue Param17
    FROM (
    SELECT * FROM Table1 WHERE GroupID = @GroupID
    ) t1
    INNER JOIN (
    SELECT * FROM Table1 WHERE GroupID = @GroupID
    ) t2 ON t1.GroupID = t2.GroupID
    WHERE t1.DataId = 4 AND t2.DataId = 17
  • Anonymous
    January 06, 2006
    The comment has been removed
  • Anonymous
    January 06, 2006
    I would call that a perf optimizer bug and it should be reported. Nice thing is...seem easy to reproduce.

    I suppose you took the cache into account?
    Could you postthe DDL for table1?
    Have you looked at the query plans?

    The optimizer may be simply choosing a bad plan because of bad statistics on table1 or bad indexes. If that is the case, the test would not be significant.
  • Anonymous
    January 09, 2006
    Response to John Ingres:
    1. My thinking on caching is that it would happen in both cases, so I ignored it for the sake of this test.
    2. Here is the table definition
    CREATE TABLE [dbo].[Table_1](
    [pk_id] [bigint] IDENTITY(1,1) NOT NULL,
    [GroupID] [bigint] NOT NULL,
    [DataId] [int] NOT NULL,
    [DataValue] varchar COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED
    (
    [pk_id] ASC
    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
    (
    [GroupId] ASC
    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    3. Yes, it's certainly possible the optimizer is choosing a bad plan...
  • Anonymous
    January 09, 2006
    All good suggestions. The point of the original post was to point out possible performance difference. I suggest doing performance testing for your specific tables/queries.
  • Anonymous
    January 10, 2006
    I'd expect even worse performance...
  • Anonymous
    December 06, 2006
    The comment has been removed