Udostępnij za pośrednictwem


SQL Server NOLOCK Hint & other poor ideas.

Frequently I see production code, created by professional development teams, peppered with NOLOCK & other TSQL Hints.

While totally understandable, as it is a common recommendation by many internet posts & often found in their sample code, this is a really bad practice.
It often results in very obscure, hard to reproduce bugs and can cause data to get corrupted.

 

Thought 1: TSQL HINTs in General

As a general rule TSQL HINTs should only be used as a last resort. Both those responsible for the ANSI SQL standard & the Microsoft SQL Development Team have given a lot of thought about what is the safest, most desirable default values for transactions & query execution. It would seem logical that your default Coding Standard should be to follow those defaults and not some code snippet you found on the internet.

Recommendation 1: Do not use any HINTs until your testing proves that you have an issue that can’t be solved any other way than by using a HINT.  

  • Be aware that any testing you do will be unique to that Specific Edition, Specific Version & Service Pack. The optimiser is constantly being enhanced, in a future release it might change to better handle whatever it is that you are hinting.
  • On more than one occasion I’ve had customers request a switch to turn off the Optimiser HINTs generated by some s/w package they’ve purchased. They’ve discovered that the ISV’s queries actually run much faster without the hints, perhaps the hints were useful 10 years ago is say, SQL 6.5, but are now a hindrance in a later release of SQL.
  • Often you can rewrite your query &/or modify schema to get a much better result.

Recommendation 2: If you use a HINT, prove it via testing & document it.

If you use a HINT, document why. I’d expect at least :-

  1. The issue or performance problem you encountered, How it worked without HINTs & how it worked with the Hint.
  2. The Version, Edition, Patch Level you tested it on. (Enterprise Edition runs many more operations in parallel than Standard, this can make a difference)
  3. Also nice if your app provides a configuration option to remove it your hints OR maybe give the customer the ability to edit your stored proc to remove it.

More than once I’ve looked at a Schema or TSQL Query & thought “Either this designer was brilliant & had such foresight to anticipate some obscure issue I’ve not even considered OR they have no clue about databases”. Unfortunately it is almost impossible to know with total certainty. So please tell the poor mongrels who maintain your code, what you were thinking. Preferably put the comments into the TSQL Code, as nearly any other place the documentation will become separated from the code.

 

Thought 2: TSQL NOLOCK / READ UNCOMMITTED HINT

This Hint is much more dangerous than its name suggests. And that it why most people who don’t understand the problem, recommend it. It creates “incredibly hard to reproduce” bugs. The type that often destroy your end-users confidence in your product & your company.

But it does make nasty warnings/errors go away without the need to really fix the problem. Similar to short sighted “tips” from other disciplines are :-

  1. Turning off the Compiler Warnings about Implicit Type conversion can speed development. NB: Comparing an INT to a SHORT is classic cause of an infinite loop.
  2. Turning up your car stereo can drown out the grinding noise of an engine with no oil.
  3. Turning up the volume of your MP3 player can save you from the terror & desperate leap to safety when you walk onto the road with your back to the oncoming traffic.
  4. Folding your arms on the “seatback in front of you” will make a big difference when you fly into the ground upside down at 900 kph. OK there may be some merit to this one, stops you annoying the person next to you.

What many people think NOLOCK is doing

Most people think the NOLOCK hint just reads rows & doesn’t have to wait till others have committed their updates or selects. If someone is updating, that is OK. If they’ve changed a value then 99.999% of the time they will commit, so it’s OK to read it before they commit. If they haven’t changed the record yet then it saves me waiting, its like my transaction happened before theirs did.

The Problem

The issue is that transactions do more than just update the row. Often they require an index to be updated OR they run out of space on the data page. This may require new pages to be allocated & existing rows on that page to be moved, called a PageSplit. It is possible for your select to completely miss a number of rows &/or count other rows twice. This has been well documented by a number of highly reputable sources, including the SQL Server Development team. So I wont repeat it here. For the details visit the links below.

SQL CAT: Previously committed rows might be missed if NOLOCK hint is used

Tony Rogerson's post, Timebomb - The Consistency problem with NOLOCK / READ UNCOMMITTED

Itzik Ben-Gan's SQL Server Mag Article on Allocation Order scans

NOLOCK Optimizer Hint May Cause Transient Corruption Errors in the SQL Server Error Log

Excuses

  • We always do that / It was just in the sample code I read. => Now you know better.
  • But I keep getting these Deadlock messages. => Solve the problem (see below)  don’t just ignore the warnings.
  • I only use it on small tables which heaps of people read & it rarely changes. => Shared Locks don’t block other shared locks. Maybe you could have your Updates change 1 row per transaction.

The solution

Unfortunately there is no 1 line recommendation I can give you that will make your Deadlock &/or other perf issues go away. People have written entire books on the subject. But a few things to consider are:-

  • Use Stored Procedures for everything. They provide a level of encapsulation or code isolation that will allow someone to change your schema & fix perf issues without needing to understand your code. Note: Creating 3 stored procs for every table; p_Insert, p_Update & p_Delete is OK but not really what I’m talking about. I prefer procs that do a unit of work that may involve multiple tables, eg: p_CreateNewCustomer() or p_NewOrder()
  • Improve your schema
    Keep your rows short – avoid adding additional audit columns (ie: LastUpdatedBy) if there is no business plan to read them.
    Ensure your Many to Many tables use both foreign keys as a the unique composite key (Primary key), ie: an Identity Column as a Primary Key is not appropriate for these tables.
  • Keep your transactions short
    Avoid having a huge Selects sandwiched between two updates in the same transaction. (this is like Loop optimisation by taking the invariant statements out of the loop).
    Avoid SELECTing a row & then changing it, eg: Instead of SELECT VALUE, UPDATE VALUE = OLD VALUE+1. Just Update the “Original value +1” as a single statement.
  • Avoid using cursors.
    Cursors are not bad but often you can find a much more efficient way to complete a task. The optimiser is forced to do “Row by Row” changes which prevents most forms of optimisation, parallelism & multi-buffered operations.
  • Try to acquire locks in the same sequence for all your transactions. eg: have all your stored procedures Lock the InvoiceHeader before the InvoiceDetails. Don’t write half one way & the other half the opposite.
  • Use Snapshot Isolation.
    But test it, your TEMPDB config might need attention.
  • Think about using a READPAST hint. If a row or page is locked you just don’t read it. Often that is OK as those “Rows” were being used anyway. eg: in any ticketing system; Airline seats, Theatre, Hotel, where you have many customers competing for finite resources, you often know the total resources anyway. In one system we drew all the seats in a concert hall & coloured them orange. Then Selected all the seats on that night with a READPAST hint. Those that were RESERVED or AVAILABLE were returned. We coloured them appropriately. Any seat not returned was possibly being locked by another reservations clerk, so remained in Orange. If they hit refresh again, these seats typically turned Red (booked)
  • SQL Broker
    Use SQL Service Broker to break your transactions up into an async component. These smaller transactions might not block for as long OR you may be able to block the queue in heavy load periods & have that part of the transaction processed in the evening.

Update to this Post

1. If NOLOCK is so bad why have it at all?

    Because sometimes accuracy is not so important. If you are plotting something on a line chart & the line is out by 1-2% you possibly wouldn’t notice, what most people would look at is the trend. Similarly, if you are putting a number on a screen & refreshing it periodically (ie: %CPU or “Orders received in the past 10 mins”, most people will only look at the 1st 2-3 significant digits. eg: If a traffic counter measured 154,218 cars went thru an intersection between 6-7am & 572,621 from 8-9am. Your brain possibly rounds it to ~150K & ~575K respectively. Unless you are charging a toll for each car, the thing most interesting is “how does that compare to other time periods”

2. How does it cause data corruption?

   Mostly when you use the SELECT .. WITH NOLOCK as the basis of some other Update or Insert. Some rows will miss out on whatever changes you thought should be applied to them.

 

I hope this assists you to improve you design Or at least avoid creating issues that will be impossible to debug in production.

All feedback & comments welcome.

Dave.

 

Thought for the day, (which sums up my feelings about Cursors & Hints) : There is no Right or Wrong, only outcome.

Technorati Tags: TSQL,SQL Server,Microsoft SQL Server

Comments

  • Anonymous
    April 05, 2009
    PingBack from http://books.linkablez.info/2009/04/05/sql-server-nolock-hint-other-poor-ideas/

  • Anonymous
    April 05, 2009
    Great post David :) NOLOCKS (as a solution) have always been an embarassing way to 'solve' a deadlock problem .. which as you said, is just hiding the real problem. Time to read those links :) >> Ensure your Many to Many tables use both foreign keys as a the unique composite key (Primary key), ie: an Identity Column as a Primary Key is not appropriate for these tables. Can u please elaborate on this? maybe make a really quick blog post on this? I've always handled many-to-many tables like this. Id INT PK Identity NOT NULL TableA_Id INT FK NOT NULL TableB_Id INT FK NOT NULL I'm very curious to the reasons to making the two FK's the composite PK. Cheers!

  • Anonymous
    April 06, 2009
    A very good post Dave. Having said this, I have a scenario as mentioned below: Consider an order processing system, which has processing of an order as a transaction. The transaction is implemented at DAL layer. Since every transaction will have a status, we need to fetch the status id from status master table and associate it with the transaction records while processing the order. In such case, if we dont place nolock, then the status table gets locked until the transaction is committed and subsequently might lead to deadlock scenario when there is more load on the system. I am curious to know if there is any alternative for this scenario with out using nolock hint. Thanks in advance Phani

  • Anonymous
    April 06, 2009
    The comment has been removed

  • Anonymous
    May 04, 2009
    Great post Dave. However, I have to comment on a few points:

  1. Starting SQL Server 2005 with the introduction the snapshot isolation, there is no justification whatsoever to use NOLOCK. However, before that, NOLOCK was the only viable solution. A SELECT without NOLOCK can be blocked forever if the table being read is constantly being updated. READPAST is worse than NOLOCK because it skips the locked data.
  2. Blindly adding NOLOCK to all SELECT statements is a very bad practice. If a result of the SELECT is being used in a sebsequent update, UPDLOCK should be used.
  3. My experience with using SQL Server without query hint has been very discouraging. I can hardly imagine a multi-user real world application that can run without ROWLOCK! Cheers!
  • Anonymous
    September 22, 2010
    The comment has been removed

  • Anonymous
    January 26, 2011
    Nobody mentioned the secnarios where you have historical data in a table that won't be changing but you might have hundreds of thousands of rows. Wouldn't using nolock be a good use?We haver our primary and  historical database and after a while,data is moved from one server to another  or from one table to a historical table.

  • Anonymous
    January 26, 2011
    Nobody mentioned the secnarios where you have historical data in a table that won't be changing but you might have hundreds of thousands of rows. Wouldn't using nolock be a good use?We have our primary and  historical database and after a while,data is moved from one server to another  or from one table to a historical table.

  • Anonymous
    March 30, 2011
    Exactly, Starforce. Dave has omitted the most common legitimate use of NOLOCK -- report solutions.  There are two scenarios.  The first is the historical copy of a production database not subject to updating where using NOLOCK can slightly reduce the overhead associated with locking.  The more interesting scenario is the reporting solution that is forced (usually by client budget) to live on the same database as production.  In that case, reports like "Total YTD Revenue by Month" could deadlock the production system without NOLOCK in place.  Using NOLOCK in reports certainly does risk having the uncommitted read or dup-PK due to page split issues, but it's very unlikely to substantially affect the historical transactions that is typically the target of reporting solutions.

  • Anonymous
    March 31, 2011
    There are reasons to use NOLOCK.  For me the main thing is for generating reports, that take a long time to run.  They don't need to lock anything, because they are READ ONLY, and never insert, update, delete EVER.  So reading dirty data is OK.  The main reason for me to use this is because I have tables with millions of rows.  To generate the report takes 2 mintues, maybe longer.  It is a very big table, and indexes are proper.  But I would NEVER want my report query to block actual users that are modifying the data - I would NEVER block them for 2 minutes.  The reports don't need to be 100% accurate.  And since about 99.9999% of my transactions commit (since my validation is very detailed) it's really not an issue. I would suggest that rather than blocking the DB provides a better way to run reports that more importaintly don't block regular transactions.  Until then we have NOLOCK.

  • Anonymous
    July 13, 2011
    Maybe one day SQL Server will evolve past locking and move into versioning which IMO is a much better architecture.

  • Anonymous
    December 11, 2011
    Hi, Thanks a lot for such a great information you have shared through this blog. Really it helped me. Now i have started analysisng the situation first and then applying table hints or other hints (snapshot) accordingly.

  • Anonymous
    February 23, 2012
    Nice article

  • Anonymous
    April 03, 2012
    How could you say "Data Corruption" and "Dirty Read" are SAME ? You saying "NOLOCK" table hint corrupt the data and saying about "Missing Data"....How could you corelate these two ? Kindly explain...

  • Anonymous
    July 12, 2012
    The comment has been removed

  • Anonymous
    August 01, 2012
    The comment has been removed

  • Anonymous
    October 15, 2012
    Hmmm... then why is Microsoft RLS massively peppered with NOLOCK?  Were the developers of RLS aware of this problem?  Why hasn't Microsoft fixed RLS?  It seems we are having huge issues because of this exact problem.

  • Anonymous
    September 24, 2013
    Are you in contact with Pinal Dave? If you are able to send him a message, can you ask him if he knows a Hitesh Joshi from Gandhinagar, Gujarat, India, who lived in Sector 8? Thank you, Hitesh Joshi (PMP, MBA) | Product Manager and Architect | O: 305-503-3121 | hjoshi@denariiglobal.com

  • Anonymous
    January 16, 2014
    Is the NOLOCK hint super evil for reporting?

  • Anonymous
    February 24, 2014
    The comment has been removed

  • Anonymous
    September 25, 2014
    The comment has been removed

  • Anonymous
    September 25, 2014
    The comment has been removed

  • Anonymous
    October 02, 2014
    So nice to see someone out the WITH (NOLOCK) foolishness. I see this being done all over the place for supposed performance gains. All they're doing is opening up a channel for dirty reads. It's worst in a staging source pull, and there is likely no contention happening on a dim load, so it's especially annoying to see people continuing to use this in data warehousing.

  • Anonymous
    November 01, 2014
    The comment has been removed

  • Anonymous
    December 03, 2014
    i suppose the question is if ur running a select and something is in the process of updating the same table do u have to have the updated records returned in ur select at that instance in time. what does it matter? if u run the select after the update the u get the records that just changed. to me its a timing thing.

  • Anonymous
    May 06, 2015
    one word. if performance is the primary constraint use nolock. if accuracy is the primary constraint do not use it.

  • Anonymous
    December 23, 2015
    Because NOLOCK and READ UNCOMMITTED are the same thing putting SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the top of a piece of code and putting NOLOCK on every table reference in that piece of code is redundant see here: www.sqlserverlogexplorer.com/difference-between-nolock-and-with-nolock