Compartilhar via


表现生猛的Like语句

 

自从升级到SQL
2005或者SQL
2008以后,我发现Like语句性能表现非常突出,有些Like语句比在SQL
2000运行的速度快了许多.
究竟是什么改进导致Like语句表现如此生猛呢?

 

答案就是字符串统计(string
summary).  SQL server 2005增加了一种对字符串进行统计的新的专利技术,
使得SQL
server 对字符类型的字段(text,
ntext, char, varchar, and nvarchar)的行数预估能力上了一个台阶. 在下面的文档对此有非常到位的描述:

 

Statistics Used by the Query Optimizer in Microsoft SQL
Server 2005

https://technet.microsoft.com/en-us/library/cc966419.aspx

 

String Summary Statistics

 

SQL Server 2005 includes patented technology for estimating
the selectivity of LIKE conditions. It builds a statistical summary of
substring frequency distribution for character columns (a string summary). This
includes columns of type text, ntext, char, varchar, and nvarchar. Using the
string summary, SQL Server can accurately estimate the selectivity of LIKE
conditions where the pattern may have any number of wildcards in any
combination. For example, SQL Server can estimate the selectivity of predicates
of the following form:

 

Column LIKE 'string%'

Column LIKE '%string'

Column LIKE '%string%'

Column LIKE 'string'

Column LIKE 'str_ing'

Column LIKE 'str[abc]ing'

Column LIKE '%abc%xy'

 

让我们看一个具体的例子:

 

use AdventureWorks

go

--Create an index for
testing

create index ms_idx on Person.Address (city)

go

然后运行如下语句:

--Free the cache plan first
to make sure re-evaluate the statistics

dbcc freeproccache

go

set statistics profile on

set statistics io on

go

select * from Person.Address where city like '%Abingdon%'

set statistics profile off

set statistics io off

go

 

结果如下(部分截图):

  

  

 

  

你注意到没有?
我的写法是like '%Abingdon%' ,但是SQL Server也能够预估到19.81行(EstimateRows)真是很了不起了。而且上面的执行计划也是很不错的哦。

 

作为对比,我使用另外的一个写法,这个写法将使得SQL
server 使用传统的统计信息(statistics)而不是string  summary 来预估行数:

 

dbcc freeproccache

go

set statistics profile on

set statistics io on

go

select * from Person.Address where city like '%Abingdon%' escape '@'

set statistics profile off

set statistics io off

go

结果如下:

 

  

 

看到了吗,EstimateRows
现在是1250,预估能力差了很多,而且执行计划也变成了比较差一些的cluster  index 
scan了。

 

能够看SQL
server是否使用string
summary 么?可以的,就是使用DBCC SHOW_STATISTICS:

 

DBCC SHOW_STATISTICS ([Person.Address],ms_idx)

go

 

结果如下:

 

 

上面的string
Index 是YES就代表string
summary已经建立了。

 

一点题外话。String
summary究竟如何实现的呢?是使用Trie
,就是单词查找树。 有兴趣的读者自己bing一下 trie, 应该有些文档讨论这个技术。

Comments

  • Anonymous
    August 12, 2011
    2005对不使用unicode存储的双字节字符的string Index处理有问题
  • Anonymous
    October 01, 2011
    The comment has been removed
  • Anonymous
    January 12, 2012
    我是sql server 2008 ,试了存储中文的字段和英文的字段,都是index scan,没有出现过index seek 呢?