Finding Corruption - Could not continue scan with NOLOCK due to data movement.
Through this blog post, I just wanted to share a troubleshooting scenario that I had with one of my customers.
Issue: Having some sporadic issues with reports coming from our SharePoint 2010/Reporting server.
They were seeing the following error and this was happening for only a few reports.
So we had a meeting convened to discuss with the Customer team about the issue and to further troubleshoot it.
The Q& A started:
Q When did this issue start?
It has been around 2 days.
Q Were any Changes made to any of the components involved in generating the Reports?
No changes were made, but the server on which the ReportServerDB is hosted, had a Disk failure and had to recover all the databases from previous backups.
Q Are the reports which are failing getting back a lot of data?
No, there are reports that get a lot of data, but even the smaller ones are failing.
Then we asked for the Error logs on the SQL Server instance. In the mean time I asked them to Query the Executionlog2 View in the Report Server Database to check the retrieval time, processing time & rendering time of the Reports that ran successfully.
So we ran the following query
select * from ExecutionLog2 order by TimeStart DESC
The query did not complete and gave us the below error.
Msg 601, Level 12, State 3, Line 2
Could not continue scan with NOLOCK due to data movement.
Then I did a quick search on the error and found the following about the above error.
“When there is corruption in database (or) When scanning the data with the NOLOCK locking hint (or) with the transaction isolation level set to READ UNCOMMITTED, it is possible for the page at the current position of the scan would have deleted or moved by page splits caused by Inserts/Updates/Deletes making SQL Server not able to scan further and cause Error 601: Could not continue scan with NOLOCK due to data movement.”
Since we did not run under READ UNCOMMITTED and No locking hint was provided, the suspicion was that the database could be corrupted. This was also strengthened by the fact that they had disk failure earlier.
To confirm my suspicions, we executed DBCC CHECKDB on the database. But the CHECKDB was progressing very slowly. When we checked the Estimated Completion Time , it was giving us 25 hours. So we killed the CHECKDB session.
By this time, we got the error logs from the SQL Server instance that was hosting this Database and parsing through the error logs found multiple occurrences of the following error.
Error: 605, Severity: 21, State: 3. ... Attempt to fetch logical page (1:3360) in database 6 failed. It belongs to allocation unit 0 not to 72057594040811520.
Database id 6 happens to be the Report Server database that is being used for reporting purposes in the SharePoint-Integrated mode.
Error 605 occurs when SQL Server detects database corruption.
When the severity is Level 21, the user session is disconnected and the error is written into the SQL ERRORLOG and Windows Application Event Log as Event ID=605
Though this confirms our original suspicion, the DBCC CHECDB was not completing and we wanted further proof through the DBCC Checks.
Since we know the allocation unit id from the error log, (72057594040811520) the plan was to find the corresponding object and run DBCC CHECKTABLE on that object.
So we ran this query in the context of the report server database.
SELECT au.allocation_unit_id, OBJECT_NAME(p.object_id) AS table_name, fg.name AS filegroup_name,
au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.filegroups AS fg ON fg.data_space_id = au.data_space_id
WHERE au.allocation_unit_id = '72057594040811520'
ORDER BY au.allocation_unit_id
The result was the CATALOG table.
We executed DBCC CHECKTABLE on the table and this completed giving the following results
There are 224 rows in 31 pages for object "Catalog".
CHECKTABLE found 0 allocation errors and 695 consistency errors in table 'Catalog'
The issue was that the backup from which this database was restored also got corrupted. The issue was resolved by restoring from a previous non corrupted backup.
Once they did that, the reports also started to function normally.
Reference: https://support.microsoft.com/kb/2015739