Udostępnij za pośrednictwem


Table variables and row estimations – Part 2

Hello again,

On my previous post, we covered how the Query Optimizer handles row estimation when using Table variables under specific conditions.

Following up on that, I will demonstrate other scenarios where the Query Optimizer must try to optimize queries when no statistics and histograms are available.

In the 2nd example of the series I will use a larger table in the AdventureWorks2012 database (Sales.SalesOrderDetail), and load it into a table variable. Note that I could just as easily have used the table in the previous post, which would yield the exact same practical result:

 SET NOCOUNT ON;

DECLARE @tblSalesOrderDetail TABLE (
    [SalesOrderID] [int] NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [ProductID] [int] NOT NULL,
    [SpecialOfferID] [int] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money] NOT NULL);

INSERT INTO @tblSalesOrderDetail
SELECT [SalesOrderID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
FROM [AdventureWorks2012].[Sales].[SalesOrderDetail];

Then select from that table variable as follows:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [CarrierTrackingNumber] LIKE '5A1A-4E3D-B2' 

OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

The Estimated Rowcount is 12131,7 while the Actual Rowcount is 21.

What does that mean in this context? The table variable has a 121317 cardinality, so 12131.7 rows are exactly 10 percent of that (12131.7 * 100 / 121317 = 10).

 

image

 

Notice that while I’m using a LIKE operator, I have no wildcards in the example. In this example, the LIKE is treated as an equality operator, and the cardinality of the filter is equal to the cardinality of its left child multiplied by the probability of the comparison being true, which is always 10 percent. If wildcards are used, then some more complex calculations are used.

Conversely, when using the same query but with NOT LIKE, we get an Estimated Rowcount of 109185.3 while the Actual Rowcount is 60898. The result is ignoring NULL values.

 

image

 

In this context, 109185,3 rows are exactly 90 percent of the table variable cardinality (109185.3 * 100 / 121317 = 90). As NOT is introduced here to negate the predicate, so to the initial probability is negated, hence 90 percent.

 

Still with the LIKE scenario, take the following query:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] LIKE ''

OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

While the Actual Rowcount is 0 rows of the table, the Estimated Rowcount is still 12131,7, equivalent to 10 percent of the table variable cardinality.

 


Moving on to the 3rd example, with the BETWEEN operator, take the following query:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] BETWEEN 700 AND 800

OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

The above example has a single BETWEEN operator, which is translated into the expression “OBJECT:(@tblSalesOrderDetail), WHERE:([ProductID] >= (700) AND [ProductID] <= (800))”. The Estimated Rowcount is 10918.53 while the Actual Rowcount is 47560.

 

image
In this context with a single BETWEEN operator, 10918.53 rows are exactly 9 percent of the table variable cardinality (10918.53 * 100 / 121317 = 9).

However, the negation behaves differently. Take the following query:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] NOT BETWEEN 700 AND 800

OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

The NOT BETWEEN is translated into the expression “OBJECT:(@tblSalesOrderDetail), WHERE:([ProductID] < (700) OR [ProductID] > (800))”, and the Estimated Rowcount is 61871.67 while the Actual Rowcount is 73757.

 

image

 

The negation of the previous query type estimates exactly 51 percent of the table variable cardinality (61871,67 * 100 / 121317 = 51).

 


Another interesting estimation scenario shown in my 4th example involves BIT type search arguments. For this I will use the HumanResources.Employee table again, and load it into a table variable:

 SET NOCOUNT ON;

DECLARE @tblEmployee TABLE ( 
    [NationalIDNumber] [nvarchar](15) NOT NULL, 
    [LoginID] [nvarchar](256) NOT NULL, 
    [OrganizationNode] [hierarchyid] NULL, 
    [JobTitle] [nvarchar](50) NOT NULL, 
    [BirthDate] [date] NOT NULL, 
    [MaritalStatus] [nchar](1) NOT NULL, 
    [Gender] [nchar](1) NOT NULL, 
    [HireDate] [date] NOT NULL, 
    [SalariedFlag] bit NOT NULL, 
    [VacationHours] [smallint] NOT NULL, 
    [SickLeaveHours] [smallint] NOT NULL);

INSERT INTO @tblEmployee 
SELECT [NationalIDNumber] 
      ,[LoginID] 
      ,[OrganizationNode] 
      ,[JobTitle] 
      ,[BirthDate] 
      ,[MaritalStatus] 
      ,[Gender] 
      ,[HireDate] 
      ,[SalariedFlag] 
      ,[VacationHours] 
      ,[SickLeaveHours] 
FROM [AdventureWorks2012].[HumanResources].[Employee];

Then, search on the SalariedFlag column as follows:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=1 OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

For the above query, the Estimated Rowcount is 145 while the Actual Rowcount is 52:

 

image

 

Why was the Estimated Rowcount 145 rows? As we established in the previous post, this table’s cardinality is 290, so 145 rows are exactly 50 percent of that (145 * 100 / 290 = 50).

This is because by design, when a BIT column is searched on (remember that SalariedFlag does not allow NULLs), it can only have 2 known values (true or false). Therefore, because there are no histograms to support an accurate estimation, the Query Optimizer estimates 50% of rows for either value.

To further demonstrate this point, if we change the search argument in the same query to:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=0 OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

The Estimated Rowcount is still 145 while the Actual Rowcount is 238.

image

Notice I explicitly stated that the SalariedFlag does not allow NULLs. What if it did?

Let’s recreate the table variable allowing NULLs in the very same column, and load it into a table variable:

 SET NOCOUNT ON;

DECLARE @tblEmployee TABLE ( 
    [NationalIDNumber] [nvarchar](15) NOT NULL, 
    [LoginID] [nvarchar](256) NOT NULL, 
    [OrganizationNode] [hierarchyid] NULL, 
    [JobTitle] [nvarchar](50) NOT NULL, 
    [BirthDate] [date] NOT NULL, 
    [MaritalStatus] [nchar](1) NOT NULL, 
    [Gender] [nchar](1) NOT NULL, 
    [HireDate] [date] NOT NULL, 
    [SalariedFlag] bit NULL, 
    [VacationHours] [smallint] NOT NULL, 
    [SickLeaveHours] [smallint] NOT NULL);

INSERT INTO @tblEmployee 
SELECT [NationalIDNumber] 
      ,[LoginID] 
      ,[OrganizationNode] 
      ,[JobTitle] 
      ,[BirthDate] 
      ,[MaritalStatus] 
      ,[Gender] 
      ,[HireDate] 
      ,[SalariedFlag] 
      ,[VacationHours] 
      ,[SickLeaveHours] 
FROM [AdventureWorks2012].[HumanResources].[Employee];

Then, search on the SalariedFlag column just as we did before:

 SET STATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=1 OPTION(RECOMPILE)

SET STATISTICS PROFILE OFF;

For the above query, the Estimated Rowcount is 95.7 while the Actual Rowcount is of course 52.

image

For a table cardinality of 290, 95.7 rows are exactly 33 percent of that (95.7* 100 / 290 = 33).

Keeping in mind that SalariedFlag now allows NULLs, it can have 3 known values (true, false or NULL). Therefore, and in a scenario where there are no histograms, the Query Optimizer estimates 33% of rows for each possible value. Note that searching on NULL values in the same query would yield 0 as the Actual Rowcount, although the Estimated Rowcount is still 33%.

That’s it for now, hope you will take all this into account when dealing with table variables, as these are most valuable tools for several usage scenarios. You can explorer further starting with the Frequently Asked Questions - SQL Server 2000 - Table Variables and Query performance and table variables.

See you next time!

Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.