A case of seeks and scans
Hello all,
A couple weeks ago while I was onsite, I was looking at some queries that had performance issues. To the customer, one in particular had become a conundrum that could not be easily answered with the usual approach of good indexing, simply because there was already a good covering index for that highly used query. Yet, massive index scans were being issued, even with good solid statistics in place. That’s what was puzzling to them.
To cut it short, the issue was neither in the table nor the index(es), but rather on the query itself. And yes, the query had a WHERE clause that was fairly narrow so nothing quite that obvious. The query was using a string function as the SARG and by design, in SQL Server, an index cannot be used to seek on some more complex expressions, expressions using functions or strings with a leading wildcard character.
Given that, the following expressions will always result in scans:
(…) WHERE ABS(ProductID) = 771
(…) WHERE UnitPrice + 1 < 3.975
(…) WHERE LastName LIKE ‘%Lopes’
(…) WHERE UPPER(LastName) = ‘LOPES’
Let’s see this in more detail using the AdventureWorks2008 database. For this exercise I created an index in the Sales.SalesOrderHeader table. Remember that a statistics object was automatically created for the 1st key column of the index, in this case “OrderDate”.
IF EXISTS (SELECT name FROM sys.indexes
WHERE name = N'NCOrderDate_AccountNr'
AND object_id = OBJECT_ID (N'Sales.SalesOrderHeader'))
DROP INDEX NCOrderDate_AccountNr
ON Sales.SalesOrderHeader
GO
CREATE NONCLUSTERED INDEX NCOrderDate_AccountNr
ON Sales.SalesOrderHeader (OrderDate, AccountNumber);
GO
Take these these two queries, where the output is the exact same 184 rows each:
DECLARE @start DATETIME = '07/01/2005', @end DATETIME = '07/31/2005'
SELECT SalesOrderID, OrderDate FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN @start AND @end
GO
SELECT SalesOrderID, OrderDate FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2005 AND MONTH(OrderDate) = 7
GO
Yet, the 1st will leverage a seek (5 logical and 2 physical reads) while the 2nd will perform an index scan (200 logical and 4 physical reads). That’s because of the date functions in the WHERE clause that are restricting the optimization choices of the database engine. Here is the execution plan:
So refactoring a query in a less obvious way will yield better results. Let’s try another example. Take the two following queries, where the output is the exact same 73 rows each:
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE OrderDate = CONVERT(VARCHAR(10), DATEADD(d, -1500, GETDATE()), 101)
GO
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE DATEDIFF(d, OrderDate, GETDATE()) = 1500
GO
Notice how they both use date functions in the WHERE clause. Yet, the 1st will leverage a seek (4 logical and 2 physical reads) while the 2nd will perform an index scan (200 logical and 4 physical reads). Although they both use date functions in the WHERE clause, the optimization restriction will occur only when the function is the search argument, not the search condition itself, which is the case of the 2nd query. Here is the execution plan:
In the last example, take these two queries, where the output is the exact same 1859 rows each:
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE LEFT(AccountNumber, 11) = '10-4030-014'
GO
SELECT SalesOrderID FROM Sales.SalesOrderHeader
WHERE AccountNumber LIKE '10-4030-014%'
GO
At a glance, given what I’ve shown here, I would say the 1st query will leverage a scan, while the 2nd will leverage a seek. Let’s check the execution plan:
My guess was wrong here.
Both queries leveraged scans, with 200 logical and 4 physical reads each. In the 1st query there is a string function as the search argument so the scan is expected. Yet, that is not the case in the 2nd query, and there is no wildcard in the beginning of the search condition. So why did I have a scan?
Well, the catch is AccountNumber cannot be seeked like in the previous examples. Ranges are found in the date, on the index statistics used here, so a scan is the only choice. Remember what I referenced earlier about statistics?
That was an interesting talk I though I should share.
Until next time!
Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.
Comments
- Anonymous
February 20, 2012
I have seen this same mistakes most developers (including myself earlier) do most of the time. Best practices are very important particularly in Databases like SQL Server, as a small mistake causes so much IO, Mem, CPU, blocking and so on....Very good post. - Anonymous
September 20, 2012
Very helpful post