SetFileIoOverlappedRange Can Lead to Unexpected Behavior for SQL Server 2008 R2 or SQL Server 2012 (Denali)
You should be aware the Windows bug with API = SetFileIoOverlappedRange . This is used by SQL Server (2008 R2 and SQL 2012) only when Locked Pages has been enabled and is in use by the SQL Server.
SQL Server Support has NOT encountered this in a production environment. The issue was found during Denali (SQL 2012) testing. This is for information purposes only. WORKAROUND: Disable Locked Pages for the SQL Server instance.
Problem |
Calls to the SetFileIoOverlappedRange can corrupt the OVERLAPPED structures of SQL Server. This can lead to incorrect offsets, wrong timings, early completion, etc… This is called every time we open an FCB (SQL Server's user mode File Control Block Structure). For example: · Open of current database files (MDF, NDF, LDF) - standard recovery · Open of new database files (MDF, NDF, LDF) - create database (Testing shows for a create database this is called 4 times to get everything going) · Auto close opens (MDF, NDF, LDF) - use on closed database · DBCC Snapshot opens (MDF, NDF) · Snapshot database file opens (MDF, NDF) · Restore/Log Shipping – When it opens the FCB to do the restore to *** Backup to disk does NOT use this code path. ***
|
Symptoms |
Wide ranging in SQL from invalid write location, lost read or write, early access to a page that is not yet fully in memory, I/O list damage such as AVs, incorrect timing reports, and many others. You may not even see the situation until days later.
|
Bug Id #818757 |
This is a race condition trigged when using SQL 2008 R2 and (Denali (SQL 2012) before RC1). If you are running with Locked Pages In Memory there is a possibility of corrupting outstanding I/O requests when reads/writes are outstanding AND concurrently another file is being opened. Unfortunately memory corruption (scribbler behavior) may happen in the different parts of the data structure. In the majority of the cases we’ve seen so far we corrupt I/O queue pointers, thus causing SQL Server instance to produce infinite amount of dumps (looping). We have also seen in-memory corruption, calls to I/O completion when the I/O is not finished. Write/read at incorrect offset(s) within the file is also possible but unlikely. Windows 8 contains the API fix that is being back ported to previous version(s) of Windows.
|
The SQL Server 2012 and the SQL Server 2008 R2 CU based fixes introduce trace flag 8903. These builds will not use the API unless the trace flag is enabled.
Windows 2008 / Vista |
Windows 2008 R2 / Windows 7 |
(Windows 2008 R2 / Windows 7) + QFE* |
Windows 8 |
|
SQL 2008 R2 |
Exposed |
Exposed |
Fixed |
Fixed |
SQL 2008 R2 – PCU2** |
Fixed |
Fixed |
Fixed |
Fixed |
SQL 2008 R2 – PCU2 –T8903 enabled |
Exposed |
Exposed |
Fixed |
Fixed |
SQL 2012 |
Fixed |
Fixed |
Fixed |
Fixed |
SQL 2012 – T8903 enabled |
Exposed |
Exposed |
Fixed |
Fixed |
*The Windows fixes are slated to be included in a Windows Update package. Reference Bug #s 395823 (KB 2679255 with SP2 Target), 395948 (KB 2679255 with SP3 Target),
** The SQL Server issue is fixed in SQL 2012 RTM and with the bug # 870529 for SQL Server 2008 R2 PCU2 release.
API Reference: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365540(v=vs.85).aspx
When the fixes are available I will be updating the matrix above and Microsoft will be creating the proper System Center Advisor rules to detect exposure.
Bob Dorr - Principal SQL Server Escalation Engineer
Comments
Anonymous
April 10, 2012
Looks like a fix is now available - support.microsoft.com/.../2679255Anonymous
November 13, 2012
The comment has been removedAnonymous
November 13, 2012
As always Bob, you are awesome. Thanks for looking into this and answering the questions!Anonymous
June 23, 2014
I encountered this condition in a DEV platform earlier today, when attaching two databases of nominal size (20GB). 2014-06-23 15:08:48.620 spid60 SetFileIoOverlappedRange failed, GetLastError is 1314 2014-06-23 15:08:48.630 spid60 SetFileIoOverlappedRange failed, GetLastError is 1314 2014-06-23 15:08:48.640 spid60 SetFileIoOverlappedRange failed, GetLastError is 1314 2014-06-23 15:08:48.650 spid60 Starting up database '<blah>'. WS 2012 R2 (6.2 3900) (Hyper-V) SQL Server 2012 (SP1) (11.0.3128.0) LPIM is configured. We're already troubleshooting storage performance, which is why these databases are being relocated. There have been infrequent, but regular messages similar to the following: SQL Server has encountered 20 occurrence(s) of I/O requests taking longer than 15 seconds to complete I am left with the impression this particular error may be posted, but can be ignored. Is this accurate? Thank you, in advance!