Previously committed rows might be missed if NOLOCK hint is used
I received a question from one of our customers about using the NOLOCK hint: can it cause missing rows in scans even if the rows were committed well before my SELECT with NOLOCK starts?
The NOLOCK hint is employed by many users to avoid contention on tables where updates are performed concurrently with selects. The popular belief is that the only side effect of using the NOLOCK hint is that it may cause reading uncommitted rows. These would be the rows that may be either rolled back or are part of larger concurrent transaction from which some rows may not be retrieved because they had not been written to the pages at the time the NOLOCK SELECT has been scanning them. It is much less obvious that even rows committed a long time before my NOLOCK transaction started might be skipped in the scan. (After I asked several SQL Server experts around me, I found that no one except for the development lead who owns the component in question knew this.) Additionally, a scenario theoretically exists (I don’t have a repro yet; read below) where the same record might be accounted for twice if SELECT reads a table with a NOLOCK hint.
This blog contains a simple repro – two scripts that you have to run on two connections concurrently showing double-counting rows. The third script is used to identify cases when the count of rows in the table decreased in the subsequent scan despite the fact we are only inserting rows into the table.
I will then show how you can realize the same concurrency benefit in SQL Server 2005 that is provided by NOLOCK while maintaining transactional consistency, and without using the NOLOCK hint. Later, I will explain which table scans might have this problem and which will not (even with NOLOCK), and how you can distinguish the scans by investigating their showplans. Then, I put up a challenge – create a repro that shows double-counting previously committed rows with the NOLOCK hint.
First, you should understand that SQL Server is using two kinds of scans for reading data from tables and indexes: allocation scans and range scans. An allocation scan is scanning the pages in physical (allocation) order as they are laid down on the disk. In this situation, no particular logical order of rows is guaranteed. If the optimizer requires rows to be scanned in a certain order, then a range scan is employed. This is usually when there is a predicate on the leading column of an index or if the optimizer requires visiting the rows in the logical order (for example, because of aggregation or ORDER BY in the query).
Next, you should know that if a new row is inserted into an index (clustered or not), it might cause page split. This happens if the existing page is full and the new row belongs between two existing rows (because of the key value of the new row is between the key values of the two rows) and it cannot fit physically into the same page. The page is split into two (approximately ½ empty each), and the new row is inserted where it belongs.
Here is a simple scenario that shows that the NOLOCK hint might miss some rows when concurrently executed updates (INSERTs in our case) are causing page splits. The INSERTs in script #2 will cause page splits because the table has a clustered key. First, you insert two rows, one with key value 1, and another with key value 10000. Then, you insert rows with key values 2, 9999, 3, 9998, etc. When the first page is full, it is split as described above. However, if the concurrently performed SELECT is using the NOLOCK hint on the same table, and the optimizer had chosen the allocation scan, the scan is moving only forward. Therefore, it might miss the ½ page of rows if the newly allocated page falls behind the scan.
Here are the scripts:
Script #1 – This script is trying to count the rows in the table t1. You can run three versions; see the “activate only one…” comment in the script. The first version has no hint, the second has the NOLOCK hint, and the third has both the NOLOCK hint and a BETWEEN predicate. The predicate causes the optimizer to select the range scan instead of the unordered scan in the previous two cases. You start this script and let it loop on the while statement until 10000 rows are inserted into the table t1 by the second script.
use test;
go
drop table tcnt;
create table tcnt(num int, cnt int);
drop table t1;
create table t1 (a int primary key, b char(500));
declare @i int, @tcnt1 int, @tcnt2 int;
set @i=1;
set @tcnt1=0;
set @tcnt2=0;
insert into tcnt values (0,0);
while (@tcnt2<10000)
begin
-- Activate only one of the following three SELECT statements and compare the behaviors
-- select @tcnt2=count(*) from t1; --- this will perform the correct count
-- select @tcnt2=count(*) from t1 with (NOLOCK); -- this one might skip some previously committed records
--select @tcnt2=count(*) from t1 with (NOLOCK) where a between 0 and 10001; -- this one will work correctly because the predicate is causes the optimizer to select a range scan
if (@tcnt2<>@tcnt1)
begin
insert into tcnt values (@i,@tcnt2);
set @i=@i+1;
set @tcnt1=@tcnt2;
end;
end;
Script #2 – You activate this script from another connection against the same database where you ran the first one.
use test;
go
declare @imin int, @imax int;
set @imin=1;
set @imax=10000;
while (@imax>@imin)
begin
insert into t1 values (@imin,'x');
insert into t1 values (@imax,'y');
set @imin=@imin+1;
set @imax=@imax-1;
end;
Script #3 – You run this one to investigate the result after the two previous scripts are finished. It will show all cases when the count(*) value decreased after additional committed inserts were performed. You can observe this only in the second scenario in script #1, when you run the query with NOLOCK hint and without the BETWEEN predicate. If NOLOCK is not used or the NOLOCK hint is used in conjunction with the BETWEEN predicate, the following query returns an empty result:
select * from tcnt t1, tcnt t2 where t1.num<t2.num and t1.cnt>t2.cnt;
If the statement select @tcnt2=count(*) from t1 with (NOLOCK); is executed in script #1, the result of the above select * is non-empty. The following shows some of many rows I see in my output:
num cnt num cnt
----------- ----------- ----------- ----
622 1189 624 1187
623 1191 624 1187
687 1310 690 1309
688 1312 690 1309
689 1313 690 1309
1044 2004 1051 2003
1045 2006 1051 2003
1046 2008 1051 2003
1047 2010 1051 2003
1048 2012 1051 2003
1049 2014 1051 2003
1050 2015 1051 2003
1460 2817 1463 2815
……..
The “num” column is the sequence number of the count(*) against the table where we inserted rows. The “cnt” column is the result of the same count(*). Therefore, one can expect that if my count(*) #622 counted 1189 rows, count #624 must logically see more rows because we are only inserting. However, we see only 1187 rows. This shows that scan #624 must have missed some rows we have encountered in scan #622. An important detail is that the insert transaction of each single row was committed, which is the default behavior.
In SQL Server 2005, no-blocking scans can be performed by using alter database <dbname>set READ_COMMITTED_SNAPSHOT ON (for more information, see (https://msdn2.microsoft.com/en-us/library/ms189050.aspx). This is the best way to get rid of the NOLOCK statements, achieve greater concurrency, and return consistent data. But be aware that there can be negative performance implications – for example, the READ_COMMITTED_SNAPSHOT will force using range scans while the NOLOCK hint is using allocation scan. Paul Randal’s blog at https://blogs.msdn.com/sqlserverstorageengine/archive/2006/11/09/when-can-allocation-order-scans-be-used.aspx explains many cases when SQL Server might use allocation scans. However, the blog does not mention cases when the allocation scan is replaced by range scan even if NOLOCK (or TABLOCK or READ UNCOMMITED SNAPSHOT isolation) is used. You can find out if range scan is used with NOLOCK hint by investigating the graphical showplan. There you will see the following Seek Predicate property (at least one of the Start or End Range must be present for the scan to be “range scan”):
Start Range: [test].[dbo].[t1].a >= CONVERT_IMPLICIT(int,[@1],0), End Range: [test].[dbo].[t1].a <= CONVERT_IMPLICIT(int,[@2],0)
Similarly, in the XML showplan, you will find either something similar to StartRange ScanType="GE” or EndRange ScanType="LE" (note the GE is for the >= , and LE is for the <= predicate type, respectively).
I’m challenging readers to find similar reproductions showing twice-scanned rows. Please submit your solution here, or submit feedback to https://blogs.msdn.com/sqlcat/contact.aspx. I will reward the first correct solution by sending the author one of the books from the “Inside Microsoft® SQL Server™ 2005” series (https://www.microsoft.com/mspress/findabook/list/series_SH.aspx).
After I finished the above investigation I searched the Web and found Tony Rogerson’s blog with the same theme – skipping and double-counting previously committed rows when NOLOCK hint is used. The blog is located at https://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx. Maybe I should have searched first; but this way, I enjoyed the discovery, and creating the repro. I decided to post my own blog for several reasons - the repro is a bit different, I explain the discrepancy between the range and allocation scans, and I show how you can find out if your NOLOCK is returning unexpected row sets. Most importantly, I think there are still many users using the NOLOCK hint without realizing the potentially serious consequences that both blogs describe.
Lubor Kollar
SQL Server Development
Comments
Anonymous
February 07, 2007
PingBack from http://sqlserverforum.org/archives/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used/Anonymous
February 12, 2007
Hi lubor,we think we have managed to answer your challange and repro a solution that demonstrates twice scanned rows : -- first create tables,index Create table DemoPageSplits (a int ,b varchar(6000)) Create unique clustered index ix_clust on DemoPageSplits(a) Create table SaveInsertResults (Counter int,KeyValue int,InsTime datetime) -- fill the table TRUNCATE TABLE DemoPageSplits GO set nocount on declare @i int ; set @i=1; while (@I<=10000) begin INSERT INTO DemoPageSplits VALUES (@I,'X') set @i=@i+1; end; -- RUN SCRIPT 2 and SCRIPT 3 in a sperate spid -- make sure you feed inside SCRIPT 3 the coorect spid of SCRIPT 2 -- SCRIPT 2 -- update the table and create page splits declare @imin int, @imax int; set @imin=1; set @imax=9999; While (@imax>@imin) begin update DemoPageSplits set b= replicate('x',6000) Where a=@imin update DemoPageSplits set b= replicate('y',6000) where a=@imAX set @imin=@imin+1; set @imax=@imax-1; end; --SCRIPT 3 --replace SPID= with the correct SPID of SCRIPT 2 -- we make sure each insert has a specific id (counter) -- so we can determine if we get duplicate values from -- the same INSERT ... SELECT statment TRUNCATE TABLE SaveInsertResults GO DECLARE @cnt INT SET @cnt=1 WHILE EXISTS(SELECT STATUS FROM sys.sysprocesses WHERE SPID=(spid of script2 )AND STATUS<>'SLEEPING') BEGIN INSERT INTO SaveInsertResults SELECT @cnt,A,getdate() FROM DemoPageSplits (NOLOCK) SET @cnt=@cnt+1 END --SCRIPT 4 --- check the results -- we just get the first occurance of having count more then 10000 -- create index just to make query run faster create clustered index inx_clust on SaveInsertResults(Counter,keyvalue,InsTime)
SELECT Counter,KeyValue,InsTime,COUNT() CountDuplicates FROM SaveInsertResults WHERE Counter in (SELECT top 1 Counter FROM SaveInsertResults GROUP BY Counter HAVING COUNT()>10000 ) GROUP BY Counter,KeyValue,InsTime HAVING COUNT(*)>1 partial results : Counter KeyValue InsTime CountDuplicates
24 451 2007-02-12 14:01:23.140 2 24 562 2007-02-12 14:01:23.140 2 24 563 2007-02-12 14:01:23.140 2 24 564 2007-02-12 14:01:23.140 2 24 565 2007-02-12 14:01:23.140 2 24 566 2007-02-12 14:01:23.140 2 24 567 2007-02-12 14:01:23.140 2 24 568 2007-02-12 14:01:23.140 2 24 569 2007-02-12 14:01:23.140 2 24 570 2007-02-12 14:01:23.140 2 24 571 2007-02-12 14:01:23.140 2 24 572 2007-02-12 14:01:23.140 2 24 573 2007-02-12 14:01:23.140 2 24 574 2007-02-12 14:01:23.140 2 24 575 2007-02-12 14:01:23.140 2 24 576 2007-02-12 14:01:23.140 2 24 577 2007-02-12 14:01:23.140 2 24 578 2007-02-12 14:01:23.140 2 24 579 2007-02-12 14:01:23.140 2 24 580 2007-02-12 14:01:23.140 2 24 581 2007-02-12 14:01:23.140 2 24 582 2007-02-12 14:01:23.140 2 24 583 2007-02-12 14:01:23.140 2 24 584 2007-02-12 14:01:23.140 2 24 585 2007-02-12 14:01:23.140 2 24 586 2007-02-12 14:01:23.140 2 24 587 2007-02-12 14:01:23.140 2 ...... this show duplicate keys created in the destination table becasue of twice scanned rows. Uri Munitz & Danny Ravid Clalit Health Services
Anonymous
March 05, 2007
Uri and Danny - your solution is correct and you will receive a book of your choice (handled separately in private mail). After posting my blog I was contactet by Itzik Ben-Gan who pointed me to his article in SQL Server magazine http://www.sqlmag.com/Articles/Index.cfm?ArticleID=92888&DisplayTab=Article where he provided solution to my problem even before I have posted my blog... I would like thank to Itzik for not spoiling this competition and let other try to come up with their own scripts to show the behavior!Anonymous
June 12, 2007
Over the past month or so, I've looked at pretty much every isolation level except for read uncommittedAnonymous
August 22, 2007
Isolation level read commited - linii duplicateAnonymous
November 03, 2008
This post talks about talks about ways to Store Spatial Reference data in SQL in order to improve performanceAnonymous
February 23, 2009
I often get asked about the implications of using nolock query hints or read uncommitted and my responseAnonymous
April 05, 2009
Frequently I see production code, created by professional development teams, peppered with NOLOCK &Anonymous
May 11, 2010
Just to clarify: you state "might miss some rows if the new page allocated comes after the scan". How can this be without doing any deletes? Are the new pages/extents not allocated in file order? Or do you mean maybe with multiple files in a filegroup? Or do you mean: a) the page split occurs b) half the rows are moved off the page c) the concurrent read scan reads the half page d) the concurrent read scan reads the rest of the pages, completing its scan e) the new page is allocated from an extent f) the other half of rows are put on this new page Is this the "typical" interleaving of events that leads to this behavior? Or am I missing something? If the above is the case, then the "read duplicate rows" case is where the order of the above events is: c (but this time reads the full page) -> a -> b -> e -> f -> d right? Thanks! Steve Ash