共用方式為


Don’t change value of that parameter

Parameter sniffing is a well known among SQL User community.   But I have seen variations of this frequently that need a bit creative handling and solution may not that straight forward.

One of the variation is that a user changes the value of the parameter inside the procedure.  When the procedure is compiled at first time, it uses the value of the parameter for cardinality estimate.  If you change it inside the body to a different value, SQL Server won’t know that and it can cause incorrect cardinality estimate. 

Recently I worked with a customer who used an output parameter for a procedure but needed to use the same parameter in a few places to query data.   It caused incorrect cardinality.  Additionally, ascending key issue is at play as well.

I’m going to use a very simple to illustrate the cardinality estimate problem with faked data.  Customer’s issue is more complex involving multiple table join.

 

Setting up data

 

use tempdb
go
if object_id ('[Check]') is not null drop table [Check]
go
if object_id ('[Batch]') is not null drop table [Batch]
go
create table Batch (BatchID int identity primary key, BatchType tinyint)
go
create table [Check] (CheckID int identity primary key, BatchID int references Batch (BatchID))
go

set nocount on

declare @i int = 0, @j int = 0
begin tran
while @i < 500
begin
    insert into Batch values (1)
    declare @batchid int = @@identity
    set @j =   cast (RAND()  * 1000 as int)
   
    declare @k int = 0
    while @k < @j
    begin
        insert into [Check] (BatchID) values (@batchid)
        set @k = @k + 1
    end
    set @i = @i + 1
end
commit tran

go
create index ix_check_batchid on [check] (BatchID)
go

 

if object_id ('p_test') is not null drop procedure p_test
go
create procedure p_test @BatchID int output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
--insert some 200 fake values
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
--now select
select * from [Check] where BatchID = @BatchID

go

 

Problem

Note that the procedure p_test takes a parameter called @BatchID.   The intention of the procedure is to insert a value into table Batch.  Then take the identity value BatchID and insert into another table called Check.

I chose a simple insert with select top from check table itself.  But in customer’s scenario, they were actually inserting a large rowset by processing XML documents passed into the procedure using OPENXML.

 

Here is the problem:   SQL Server will not be able to estimate the predicate BatchID = @BatchID correctly in the statement “select * from [Check] where BatchID = @BatchID” .

@BatchID is declared as an output parameter.  When the procedure is called and compiled like “exec p_test @Batchid output”, the @batchID will be NULL.  That will be the value used for cardinality estimate.  But later, the @BatchID gets changed to be the identity value.    From the insert (insert into [Check] (BatchID) select top 200 @BatchID from [Check] ), we know for sure there will be 200 values with the new BatchID inside the check table.

The execution plan below shows that “select * from [Check] where BatchID = @BatchID” estimated 1 row.   But actually 200 rows were retrieved.

When table check is joined with other tables, this incorrect cardinality estimate will cause much a bigger problem.

set statistics profile on
go
declare @batchid int
exec p_test @Batchid output
select @batchid
go
set statistics profile off

image

 

Solution

Knowing that the value of @BatchID is incorrect and gets changed inside the procedure body, I thought it will be easy fix. I just append option (recompile) for the select statement.  Here is the complete procedure.  When I append option (recompile), the statement be able to use the new value (from the @@Identity)

 

if  object_id ('p_test2') is not null drop procedure p_test2
go
create procedure p_test2 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID  option (recompile)
go

 

To my surprise, the estimate didn’t change.  The plan below shows the estimate is still 1 row and actual rows were 200.  One thing did change.  The @BatchID now is replaced with 531.  So we know for sure the value of @BatchID is correct because of recompile. 

image

 

So I went back and studied the procedure a bit more.  If you look at the way the BatchID gets into tables Batch and Check, you will know that the values are ever increasing (ascending key problem).  If you don’t update statistics right before the query is compiled, the values of BatchID will never be part of the statistics.

Though the optimizer knows precisely what the BatchID is, it won’t help because the statistics still think it doesn’t exist.

Fortunately, there is a solution.  Enabling trace flag 2389 and 2390 will trigger stats update for these conditions.

After running dbcc traceon (2389,2390,-1), you will see the estimate changes for better.  Note that now the estimate is 506 and actual is 200.   The reason why the estimate is not exactly 200 is that this value is not one of the steps in the histogram.  SQL Server has to ‘calculate’ the value.   For customer’s situation, this dramatically improved performance and changed join type from nested loop to hash join.

image

 

Some users may have concerns about the solution to use these trace flag as it triggers more frequent stats update.

You can actually use OPTIMIZE FOR UNKNOWN like “select * from [Check] where BatchID = @BatchID  option (optimize for (@BatchID unknown))”.  When you do this, SQL Server won’t sniff the parameter.  It will simply pick average density for cardinality estimate.   This method may work well if your data is uniformly distributed.

 

There are other solutions such as not using the output parameter this way etc.   Many times, when we do support, we are limited how much changes that can be made. Problems usually come after production.  So we want to find out best possible solution with minimal change.

 

Complete Demo script

 

--setting up data
use tempdb
go
if object_id ('[Check]') is not null drop table [Check]
go
if object_id ('[Batch]') is not null drop table [Batch]
go
create table Batch (BatchID int identity primary key, BatchType tinyint)
go
create table [Check] (CheckID int identity primary key, BatchID int references Batch (BatchID))
go

set nocount on

declare @i int = 0, @j int = 0
begin tran
while @i < 500
begin
    insert into Batch values (1)
    declare @batchid int = @@identity
    set @j =   cast (RAND()  * 1000 as int)
   
    declare @k int = 0
    while @k < @j
    begin
        insert into [Check] (BatchID) values (@batchid)
        set @k = @k + 1
    end
    set @i = @i + 1
end
commit tran

go
create index ix_check_batchid on [check] (BatchID)
go

if object_id ('p_test') is not null drop procedure p_test
go
create procedure p_test @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
--insert some 200 fake values
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
--now select
select * from [Check] where BatchID = @BatchID

go

--bad esimate for  select * from [Check] where BatchID = @BatchID
set statistics profile on
go
declare @batchid int
exec p_test @Batchid output
select @batchid
go
set statistics profile off

go

--solution using option recompile
if  object_id ('p_test2') is not null drop procedure p_test2
go
create procedure p_test2 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID  option (recompile)
go

--note that estimate is still off
set statistics profile on
go
declare @batchid int
exec p_test2 @batchid output
select @batchid
go
set statistics profile off
 

go
--enable trace flag
dbcc traceon (2389,2390,-1)

go

--now estimate is improved dramatically
set statistics profile on
go
declare @batchid int
exec p_test2 @batchid output
select @batchid

go
set statistics profile off
 
go

--additional solutoin using OPTIMIZE FOR UNKNOWN
create procedure p_test3 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID  option (optimize for (@BatchID unknown))
go