共用方式為


How can I get that user out of my table quickly

Recently, I worked on an customer issue to help tune their slow query.   The query was fairly complex involving multiple table joins.  The key issue is the inability to do seeks on a particular table.

The table has a column that stores User Name like below.  The values stored contain domain name\user name.  Here is a set of sampled fake data:

UserName
mydomain0\user0
mydomain1\user1
mydomain2\user2
mydomain3\user3
mydomain4\user4
mydomain5\user5
mydomain6\user6
mydomain7\user7
mydomain8\user8
mydomain9\user9

Problem

The problem is that their application only passes user name without the domain name.   So the parameter will be user0, user1 etc.   Now this makes it very challenging to seek on the user.  Application couldn’t be changed to take domain name.

So this customer basically used like as the procedure below.  This resulted in scanning of the table and caused performance slowdown.

create procedure p_test1 @user varchar(20)
as
select * from t where UserName like '%' + @user

This user did put an index on UserName and saw the query plan had ‘seek’.  But the performance still didn’t meet the requirement. 

When you use like statement and have an index on the column that uses like, SQL Server tries its best to leverage that index.  It actually tries to calculate a range of values based on the value you pass in and then does index seek on the column.

Below is what the plan looks like:

 

image

 

The problem is that the range can be very large.  If you notice, the index seek is based on two expressions calculated (with > and <) followed by a filter (where …).  So this can be fairly expensive.

In fact, for 2 million dummy rows, it resulted in a logical reads of 9309 pages (as indicated below).

Table 't'. Scan count 1, logical reads 9309, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Though it’s better than table scan, it is still not the best to suite their needs.

 

Solution

Computed column comes to rescue.   We have talked about computed column in our blog.  But when it comes to solve practical problems, it still scares some users.

For this particular problem, all we need is to use charindex and substring.   The expression of the computed column seems scary.  But it is used to cover special cases such as NULL, blank values etc.

After you create the computed column and then create an index on it.

alter table t
add UserAlias as (substring (UserName, case when charindex ('\', UserName) is null or charindex ('\', UserName)  = 0 then 1 else  charindex ('\', UserName) + 1 end, len (UserName) - case when charindex ('\', UserName) is null or charindex ('\', UserName)  = 0 then 1 else  charindex ('\', UserName) + 1 end +1) ) persisted

create index ix2 on t(UserAlias)

 

now change your query like this:

create procedure p_test2 @user varchar(20)
as
select * from t where UserAlias = @user

Here is the query plan:

 

image

It had dramatic reduction of logical reads (only 3) from 9309 noted before with the range seek with like

Table 't'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

The CPU went from over 700 ms to 1 ms after the change.

 

 

Complete demo script

 

use tempdb
go
--settting up table and rows
drop table t
go
create table t (UserName varchar(50))
go
set nocount on
declare  @i int = 0, @domainname varchar(20), @username varchar(30)
begin tran
while @i < 2000000
begin
      
       set @domainname =    'mydomain' + cast ( @i %20  as varchar(20))
       set @username = 'user' + cast(@i % 100000 as varchar(20))
       insert into t values (@domainname + '\' + @username)
      
       set @i = @i + 1
end
commit tran
go

if object_id ('p_test1') is not null drop procedure p_test1
go

create procedure p_test1 @user varchar(20)
as
select * from t where UserName like '%' + @user
go

create index ix on t (UserName)

go
set statistics profile on
set statistics time on
set statistics io on
go
--this query still ends up with many logical reads and fairly high CPU consumption
exec p_test1 'user'
go
set statistics profile off
set statistics time off
set statistics io off
go

--adding a computed column
alter table t
add UserAlias as (substring (UserName, case when charindex ('\', UserName) is null or charindex ('\', UserName)  = 0 then 1 else  charindex ('\', UserName) + 1 end, len (UserName) - case when charindex ('\', UserName) is null or charindex ('\', UserName)  = 0 then 1 else  charindex ('\', UserName) + 1 end +1) ) persisted
go

go
--creating an index on the computed column
create index ix2 on t(UserAlias)
go

if object_id ('p_test2') is not null drop procedure p_test2
go
--new procedure to take advantage of the computed column index
create procedure p_test2 @user varchar(20)
as
select * from t where UserAlias = @user

go

set statistics profile on
set statistics time on
set statistics io on
go
exec p_test2 'user'
go
set statistics profile off
set statistics time off
set statistics io off
go

Comments

  • Anonymous
    January 22, 2013
    The comment has been removed

  • Anonymous
    January 22, 2013
    Or just use REVERSE().  That'll give you the same index utilization.  Actually for your particular (unrealistic) sample set, that's the ideal index to use :). You don't get the utility of having the user name in a separate column, but in other examples where you're wanting to search just the end of the string that's going to be easier to write and maintain.  Really it's probably less code.... REVERSE(SUBSTRING(REVERSE(@userid), 1, CHARINDEX('', REVERSE(@UserID))-1)) to do your name splitting that way anyway.  You'd need to add in a to the start of the string if not present though, so it'd get messier.

  • Anonymous
    January 28, 2013
    Nice post with nice solution. Thanks for the share. <a href="www.venturerepublic.net/">Venture Web Development</a>

  • Anonymous
    April 30, 2013
    Or define the computed column as REVERSE(UserName) and define the search predicate as UserAlias = REVERSE(@user)+'%'.  Would that work?