Troubleshoot queries that seem to never end in SQL Server
This article describes the troubleshooting steps for the issue where you have a query that seems to never complete, or getting it to complete may take many hours or days.
What is a never-ending query?
This document focuses on queries that continue to execute or compile, that is, their CPU continues to increase. It doesn't apply to queries that are blocked or waiting on some resource that is never released (the CPU remains constant or changes very little).
Important
If a query is left to finish its execution, it will eventually complete. It could take just a few seconds, or it could take several days.
The term never-ending is used to describe the perception of a query not completing when in fact, the query will eventually complete.
Identify a never-ending query
To identify whether a query is continuously executing or stuck on a bottleneck, follow these steps:
Run the following query:
DECLARE @cntr int = 0 WHILE (@cntr < 3) BEGIN SELECT TOP 10 s.session_id, r.status, r.wait_time, r.wait_type, r.wait_resource, r.cpu_time, r.logical_reads, r.reads, r.writes, r.total_elapsed_time / (1000 * 60) 'Elaps M', SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1, ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.TEXT) ELSE r.statement_end_offset END - r.statement_start_offset) / 2) + 1) AS statement_text, COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) + N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text, r.command, s.login_name, s.host_name, s.program_name, s.last_request_end_time, s.login_time, r.open_transaction_count, atrn.name as transaction_name, atrn.transaction_id, atrn.transaction_state FROM sys.dm_exec_sessions AS s JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st LEFT JOIN (sys.dm_tran_session_transactions AS stran JOIN sys.dm_tran_active_transactions AS atrn ON stran.transaction_id = atrn.transaction_id) ON stran.session_id =s.session_id WHERE r.session_id != @@SPID ORDER BY r.cpu_time DESC SET @cntr = @cntr + 1 WAITFOR DELAY '00:00:05' END
Check the sample output.
The troubleshooting steps in this article are specifically applicable when you notice an output similar to the following one where the CPU is increasing proportionately with the elapsed time, without significant wait times. It's important to note that changes in
logical_reads
aren't relevant in this case as some CPU-bound T-SQL requests might not do any logical reads at all (for example performing computations or aWHILE
loop).session_id status cpu_time logical_reads wait_time wait_type 56 running 7038 101000 0 NULL 56 runnable 12040 301000 0 NULL 56 running 17020 523000 0 NULL This article isn't applicable if you observe a wait scenario similar to the following one where the CPU doesn't change or changes very slightly, and the session is waiting on a resource.
session_id status cpu_time logical_reads wait_time wait_type 56 suspended 0 3 8312 LCK_M_U 56 suspended 0 3 13318 LCK_M_U 56 suspended 0 5 18331 LCK_M_U
For more information, see Diagnose waits or bottlenecks.
Long compilation time
On rare occasions, you might observe that the CPU is increasing continuously over time but that's not driven by query execution. Instead, it could be driven by an excessively long compilation (the parsing and compiling of a query). In those cases, check the transaction_name output column and look for a value of sqlsource_transform
. This transaction name indicates a compilation.
Collect diagnostic data
- SQL Server 2008 - SQL Server 2014 (prior to SP2)
- SQL Server 2014 (after SP2) and SQL Server 2016 (prior to SP1)
- SQL Server 2016 (after SP1) and SQL Server 2017
- SQL Server 2019 and later versions
To collect diagnostic data by using SQL Server Management Studio (SSMS), follow these steps:
Capture the estimated query execution plan XML.
Review the query plan to see if there are any obvious indications of where the slowness can come from. Typical examples include:
- Table or index scans (look at estimated rows).
- Nested loops driven by a huge outer table data set.
- Nested loops with a large branch in the inner side of the loop.
- Table spools.
- Functions in the
SELECT
list that take a long time to process each row.
If the query runs fast at any time, you can capture the "fast" executions Actual XML Execution Plan to compare.
Method to review the collected plans
This section will illustrate how to review the collected data. It will use the multiple XML query plans (using extension *.sqlplan) collected in SQL Server 2016 SP1 and later builds and versions.
Follow these steps to compare execution plans:
Open a previously saved query execution plan file (.sqlplan).
Right-click in a blank area of the execution plan and select Compare Showplan.
Choose the second query plan file that you would like to compare.
Look for thick arrows that indicate a large number of rows flowing between operators. Then select the operator before or after the arrow, and compare the number of actual rows across two plans.
Compare the second and third plans to see if the largest flow of rows happens in the same operators.
Here's an example:
Resolution
Ensure that statistics are updated for the tables used in the query.
Look for a missing index recommendation in the query plan and apply any.
Rewrite the query with the goal to simplify it:
- Use more selective
WHERE
predicates to reduce the data processed up-front. - Break it apart.
- Select some parts into temp tables, and join them later.
- Remove
TOP
,EXISTS
, andFAST
(T-SQL) in the queries that run for a very long time due to optimizer row goal. Alternatively, you can use theDISABLE_OPTIMIZER_ROWGOAL
hint. For more information, see Row Goals Gone Rogue. - Avoid using Common Table Expressions (CTEs) in such cases as they combine statements into a single big query.
- Use more selective
Try using query hints to produce a better plan:
HASH JOIN
orMERGE JOIN
hintFORCE ORDER
hintFORCESEEK
hintRECOMPILE
- USE
PLAN N'<xml_plan>'
if you have a fast query plan that you can force
Use Query Store (QDS) to force a good known plan if such a plan exists and if your SQL Server version supports Query Store.
Diagnose waits or bottlenecks
This section is included here as a reference in case your issue isn't a long-running CPU driving query. You can use it to troubleshoot queries that are long due to waits.
To optimize a query that's waiting on bottlenecks, identify how long the wait is and where the bottleneck is (the wait type). Once the wait type is confirmed, reduce the wait time or eliminate the wait completely.
To calculate the approximate wait time, subtract the CPU time (worker time) from the elapsed time of a query. Typically, the CPU time is the actual execution time, and the remaining part of the lifetime of the query is waiting.
Examples of how to calculate approximate wait duration:
Elapsed Time (ms) | CPU Time (ms) | Wait time (ms) |
---|---|---|
3200 | 3000 | 200 |
7080 | 1000 | 6080 |
Identify the bottleneck or wait
To identify historical long-waiting queries (for example, >20% of the overall elapsed time is wait time), run the following query. This query uses performance statistics for cached query plans since the start of SQL Server.
SELECT t.text, qs.total_elapsed_time / qs.execution_count AS avg_elapsed_time, qs.total_worker_time / qs.execution_count AS avg_cpu_time, (qs.total_elapsed_time - qs.total_worker_time) / qs.execution_count AS avg_wait_time, qs.total_logical_reads / qs.execution_count AS avg_logical_reads, qs.total_logical_writes / qs.execution_count AS avg_writes, qs.total_elapsed_time AS cumulative_elapsed_time FROM sys.dm_exec_query_stats qs CROSS apply sys.Dm_exec_sql_text (sql_handle) t WHERE (qs.total_elapsed_time - qs.total_worker_time) / qs.total_elapsed_time > 0.2 ORDER BY qs.total_elapsed_time / qs.execution_count DESC
To identify currently executing queries with waits longer than 500 ms, run the following query:
SELECT r.session_id, r.wait_type, r.wait_time AS wait_time_ms FROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id WHERE wait_time > 500 AND is_user_process = 1
If you can collect a query plan, check the WaitStats from the execution plan properties in SSMS:
- Run the query with Include Actual Execution Plan on.
- Right-click the left-most operator in the Execution plan tab
- Select Properties and then WaitStats property.
- Check the WaitTimeMs and WaitType.
If you're familiar with PSSDiag/SQLdiag or SQL LogScout LightPerf/GeneralPerf scenarios, consider using either of them to collect performance statistics and identify waiting queries on your SQL Server instance. You can import the collected data files and analyze the performance data with SQL Nexus.
References to help eliminate or reduce waits
The causes and resolutions for each wait type vary. There's no one general method to resolve all wait types. Here are articles to troubleshoot and resolve common wait type issues:
- Understand and resolve blocking issues (LCK_M_*)
- Understand and resolve Azure SQL Database blocking problems
- Troubleshoot slow SQL Server performance caused by I/O issues (PAGEIOLATCH_*, WRITELOG, IO_COMPLETION, BACKUPIO)
- Resolve last-page insert PAGELATCH_EX contention in SQL Server
- Memory grants explanations and solutions (RESOURCE_SEMAPHORE)
- Troubleshoot slow queries that result from ASYNC_NETWORK_IO wait type
- Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups
- How It Works: CMEMTHREAD and Debugging Them
- Making parallelism waits actionable (CXPACKET and CXCONSUMER)
- THREADPOOL wait
For descriptions of many Wait types and what they indicate, see the table in Types of Waits.