Why doesn't SQL Server use statistics and index on this computed column?
In this post, I talked about how to use computed column to improve performance. By creating index on a computed columns, you can have two benefits. You get better cardinlaity esimate on the expression in your query and your query may also use that index to do seeks or scans.
Lately, I have been helping a customer. They have a view which does an aggregate (group by) like this select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
Then the view is joined with many other tables. The problem is that the expression ISNULL(c1,'0') + ISNULL (c2, '0') ends up producing distinct results. What this means is that the group by doesn't reduce number of rows by that part of the query. But since there is not statistics (because is a dynmically computed column), the estimate is much lower than actual rows. So it produced a very poor plan.
Naturally, I wanted to help by creating a persisted computed column and adding an index to it. To my surprise, I discovered that SQL Server still doesn't use the index or statistics from that index. In other words, SQL Server continues to do a table scan and continue to estimate low number of rows for that particular aggregate.
It turns out that ISNULL is the problem. After examining the table structure, some columns involved in the computed expression are non-nullable but ISNULL is used on these columns. When a column is not nullable, ISNULL is not really necessary. So optimizer simplfied the input 'tree'. But the computed column I created continues to use the expression that has ISNULL. So optimizer can't do a match. therefore, it can't use index or statistics on the computed columns.
So the solution is this: if your expression involves ISNULL, make sure you don't apply to the column that is non-nullable. if you follow this rule, index and statistics of the computed column can be used.
Let me demonstrate this by the following example:
--setting up data
use tempdb
go
drop table t
go
create table t (c1 nvarchar(20) not null, c2 nvarchar(20) null, c3 nvarchar(20))
go
set nocount on
declare @i int
set @i = 0
begin tran
while @i < 100000
begin
declare @ch nvarchar(max) = cast(@i as nvarchar(max))
insert into t (c1, c2) values (@ch, 'test')
set @i += 1
end
commit tran
go
set statistics profile on
go
--note that this query has bad CE for the aggrgate (red)
select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
/*
EstimateRows Rows Executes StmtText
------------- -------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------
316.2278 100000 1 select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
316.2278 0 0 |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))
316.2278 100000 1 |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1008]=COUNT(*)))
100000 0 0 |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[c1]+isnull([tempdb].[dbo].[t].[c2],N'0')))
100000 100000 1 |--Table Scan(OBJECT:([tempdb].[dbo].[t]))
*/
go
set statistics profile off
go
alter table t add compare_str as ISNULL(c1,'0') + ISNULL (c2, '0') persisted
go
create index ix1 on t (compare_str)
go
--note that this query continues to have incorrect estimate even after creating a computed column to match the expression and an index on that computed column
set statistics profile on
go
select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
go
set statistics profile off
/*
EstimateRows Rows Executes StmtText
------------- -------------------- -------------------- -------------------------------------------------------------------------------------------------------------------------
316.2278 100000 1 select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
316.2278 0 0 |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))
316.2278 100000 1 |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1008]=COUNT(*)))
100000 0 0 |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[c1]+isnull([tempdb].[dbo].[t].[c2],N'0')))
100000 100000 1 |--Table Scan(OBJECT:([tempdb].[dbo].[t]))
*/
--now let's change the expression a bit (not using ISNULL for non-nullable column) when creating the computed column
drop index t.ix1
go
alter table t drop column compare_str
go
alter table t add compare_str2 as c1 + ISNULL (c2, '0') persisted
go
create index ix2 on t (compare_str2)
go
--this time, the computed column is used
--and cardinality estimate is accurate and the index is used
set statistics profile on
go
select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
go
set statistics profile off
/*
EstimateRows Rows Executes StmtText
------------- -------------------- -------------------- -------------------------------------------------------------------------------------------------------------------------
100000 100000 1 select ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
100000 0 0 |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))
100000 100000 1 |--Stream Aggregate(GROUP BY:([Expr1004]) DEFINE:([Expr1008]=Count(*)))
100000 0 0 |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[compare_str2]))
100000 100000 1 |--Index Scan(OBJECT:([tempdb].[dbo].[t].[ix2]), ORDERED FORWARD)
*/
Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support
Comments
- Anonymous
July 16, 2010
That's very interesting Jack, thanks.