Database Programming: Why I, Like, May Never Write Another LIKE, Like, Ever Again
[UPDATE 30 October 2007: There are significant ambiguities in this post which are addressed in the comments and in this follow-up, which I recommend reading after you've read this post and its comments. -wp]
I've been doing some performance tuning work over the last couple of days, and I've found a new use for a technique first shown here.
One of the primary functions of our product is to examine program code for certain patterns and practices. This is accomplished through pattern-matching queries against system tables, which take the general form:
SELECT data
FROM mytable
WHERE column LIKE N'%value%'
This is among the most common performance issues we face. Full-Text Search is a wonderful technology for addressing this issue; it unleashes the CONTAINS syntax, which is a powerful and wonderful thing. Sometimes, though, Full-Text Search isn't an option. In those scenarios, here's some equivalent syntax which provides a 2.5x performance improvement for every LIKE it replaces:
SELECT data
FROM mytable
WHERE patindex('%value%',column COLLATE Latin1_General_BIN) > 0
If you just use the PATINDEX without the binary collation, you get performance equivalent to the LIKE. Adding the binary collation turns the trick and unlocks the performance boost.
So, when would I ever code a LIKE in light of this discovery? Well, the only thing I haven't figured out how to do with PATINDEX is escape a reserved character, which, as you'l recall from this post, is among the LIKE command's capabilities.
I hope this tip comes in handy for you..
-wp
Comments
Anonymous
January 01, 2003
.. but it's got to have a COLLATE with it. A couple of updates are necessary to my recent post on LIKE,Anonymous
January 01, 2003
The comment has been removedAnonymous
October 21, 2007
The comment has been removedAnonymous
October 22, 2007
Hi Ward, I just realized that I mixed up the terminology in my post a bit. In my comment, replace all instances of "case-sensitive" with "case-insensitive", and vice-versa. Your comment that "it will be moot in case-insensitive databases," seems to be reversed as well, since the binary collation will actually make the query case-sensitive. Most of the databases I see use default options, i.e. case-insensitive, although I have talked to a few people recently who seem to favor case-sensitivity for one reason or another. Definitely not the norm from my point of view, though.Anonymous
October 22, 2007
By the way, perhaps one of these will help you: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/counting-occurrences-of-a-substring-within-a-string.aspx http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/pattern-based-split-string.aspx