Slow query using non-deterministic user defined function
Recently we worked with a customer who reported a query that used to run a few seconds in SQL Server 2000 but it never finishes in SQL Server 2008 R2 following upgrade.
We went around and tried quite a few things but couldn't get SQL Server 2008 R2 to generate similar plan. Upon closer look at 2008 R2's query plan, we noticed something unusual. The plan has a warning "NO JOIN PREDICATE". What this means is that a cartesian product is introduced.
To illustrate the problem, let's use an example setup:
drop function dbo.myfunc
go
drop view v1, v2
go
drop table t1, t2
go
create table t1 (c1 int not null, c2 varchar(100))
go
create table t2 (c1 int not null, c2 varchar(100))
go
set nocount on
go
declare @i int
begin tran
select @i = 0
while (@i < 1000)
begin
insert into t1 (c1, c2) values (@i, 'a')
insert into t2 (c1, c2) values (@i, 'b')
select @i = @i + 1
end
commit tran
go
drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
--with schemabinding
as
begin
return (@c1 * 100 )
end
go
create view v1 as select c1, c2, dbo.myfunc(c1) as c3 from t1
go
create view v2 as select c1, c2, dbo.myfunc(c1) as c3 from t2
go
Now, let's run the following query
dbcc freeproccache
go
set statistics profile on
go
-- But by pulling UDF above join in this query we actually introduce a cartesian product (NO JOIN PREDICATE)
-- UDF is called 1 million times instead of 1000 times each for the two views!
select count(*) from v1 as t1 join v2 as t2 on t1.c3 = t2.c3
go
set statistics profile off
go
The above query is very slow as illustrated in the query plan below. In the line 6 for the query plan, there is a warning "no join predicate". The join resulted in 1,000,000 rows (1,000 x 1,000 rows from each table).
In line 5, the myfunc is called 2,000,000 times (1,000,000 for computing t1.c1 and 1,000,000 for t2.c1).
This is because starting SQL Server 2005, optimizer has rule changes that will disallow non-deterministic scalar functions to be 'pushed down' in some situations (like this one).
Solution
Many times, you can simply make a function deterministic by adding schemabinding option. In the above example, re-write the function with schemabinding, it will be much faster.
From the query plan, you will no longer see that the "NO JOIN PREDICATE". The scalare UDF is pushed down right after table scan and applied only 100 times on each table.
drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
with schemabinding
as
begin
return (@c1 * 100 )
end
Obviously, the function can be made deterministic. If you use following, the function will not be deterministic even you use schemabidning because of getdate(). In such cases, you will continue to see "NO JOIN PREDICATE" Cartesian product joins.
drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
with schemabinding
as
begin
return (@c1 * 100 * datepart (mm,getdate()))
end
Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support
Comments
Anonymous
July 08, 2014
Hi! It is always a pleasure to read your articles about query optimization. Can you, please, extend this thought a little bit: "optimizer has rule changes that will disallow non-deterministic scalar functions to be 'pushed down' in some situations" - what are those situations, when the non-deterministic is allowed to be pushed down, and is there any other condition when the described behavior might be observed, excep non-determinism? Thank you!Anonymous
October 23, 2014
great