Using Computed Columns to Fix Scalar Expression Estimation Errors
(Ok, let's try something a bit more involved now. Here's a tip on how you can use computed columns to improve query plan quality in SQL Server 2005. The optimizer supports building statistics on the results of expressions, and this can avoid debugging query plans later if you make sure that the system is doing this for complex expressions, user-defined expressions, or other places where the cardinality estimate isn't matching what is actually happening at runtime. Enjoy! Conor Cunningham):
Computed columns can be used to correct some cardinality estimation errors. If the optimizer is incorrectly estimating a predicate (as examples, due to correlations or because the optimizer is guessing on the selectivity of a user-defined function), a computed column can give the optimizer a hook on which statistics can be stored. Note that computed columns in SQL 2005 can be created without increasing the on-disk row width if the scalar expression meets specific requirements documented in Books Online (for example, it must be deterministic).
The following example demonstrates how computed columns can be used to affect cardinality estimates. A few notes for the example:
- The plan is very simple in this example. The cardinality estimate error does not impact the plan choice. Please understand that this problem could negatively impact join order/algorithm in larger queries.
- User-defined functions need to be created WITH SCHEMABINDING or else the optimizer does not trust them at all (they could be changed without causing the plan to recompile, which is “not good”).
use tempdb
-- create a table with 20000 rows in it
create table t1 (col1 int, col2 int)
declare @i int
set @i = 0
while @i < 20000
begin
insert into t1(col1, col2) values (@i, rand()*1000)
set @i = @i + 1
end
-- drop function dbo.foo
-- create a function. The optimizer does not understand it, so
-- it will cause guessing logic in plan generation.
create function foo(@a int) returns int
with schemabinding
as
BEGIN
set @a = 5;
return 155
END
-- clean out the plan cache
dbcc freeproccache
set statistics profile on
-- we under-guess on the UDF expression (assumes about 8% selectivity, actual is 100%)
select * from t1
where dbo.foo(t1.col1) = 155
(Abbreviated statistics profile output)
Rows Executes StmtText
-------------------- -------------------- -----------------------------------------------
20000 1 select * from t1 where dbo.foo(t1.col1) = 155
20000 1 |--Filter(WHERE:( foo(col1)=155))
20000 1 |--Table Scan(OBJECT:(t1))
EstimateRows
------------------
1681.7928
1681.7928
20000.0
-- we over-guess on the UDF expression (assumes about 8% selectivity, actual is 0%)
dbcc freeproccache
select * from t1
where dbo.foo(t1.col1) = 154
Rows Executes StmtText
-------------------- -------------------- -----------------------------------------------
0 1 select * from t1 where dbo.foo(t1.col1) = 154
0 1 |--Filter(WHERE:(foo( col1)=154))
20000 1 |--Table Scan(OBJECT:(t1))
(estimates match previous example)
set statistics profile off
-- add (non-persisted) computed column over expression.
-- Note that the function has WITHSCHEMABINDING enabled (which causes the optimizer to trust the expression will not change over recompiles)
-- Note2 - this could be persisted if the function were expensive. Non-deterministic expressions are not persistable, though imprecise expressions are.
alter table t1 add c3 as dbo.foo(col1)
set statistics profile on
-- exact same queries
-- we correctly estimate 100% selectivity for this query
dbcc freeproccache
select * from t1
where dbo.foo(t1.col1) = 155
Rows Executes StmtText
-------------------- -------------------- -----------------------------------------------
20000 1 select * from t1 where dbo.foo(t1.col1) = 155
0 0 |--Compute Scalar(DEFINE:( c3=c3))
20000 1 |--Filter(WHERE:( c3=(155)))
0 0 |--CompSca(DEFINE:( c3=foo(col1)))
20000 1 |--Table Scan(t1))
EstimateRows
------------------------
20000.0
20000.0
20000.0
20000.0
20000.0
-- we estimate 0% selectivity for this query (actually we have a floor to estimate at least one row)
dbcc freeproccache
select * from t1
where dbo.foo(t1.col1) = 154
Rows Executes StmtText
-------------------- -------------------- -----------------------------------------------
0 1 select * from t1 where dbo.foo(t1.col1) = 154
0 0 |--Compute Scalar(DEFINE:( c3=c3))
0 1 |--Filter(WHERE:( c3=(154)))
0 0 |--CompSca(DEFINE:( c3=foo(col1)))
20000 1 |--Table Scan(t1))
EstimateRows
------------------------
1.0
1.0
1.0
20000.0
20000.0
set statistics profile off
Comments
Anonymous
March 24, 2006
Conor Cunnigham kicks it off with:
On behalf of the Query Optimization Team for Microsoft's SQL Server...Anonymous
March 24, 2006
Awesone ... Good to see QO team blogging .... Hope to see good things in future.Anonymous
March 25, 2006
Of the questions that get asked on the forums many are wround query optimisation and trying to understand...Anonymous
May 22, 2006
Of the questions that get asked on the forums many are wround query optimisation and trying to understand...Anonymous
August 11, 2007
Of the questions that get asked on the forums many are wround query optimisation and trying to understandAnonymous
May 13, 2008
PingBack from http://marco.radiofreemusiccenter.info/c30sqlserver.htmlAnonymous
January 18, 2009
PingBack from http://www.keyongtech.com/2244100-need-help-explaining-slow-function