Sdílet prostřednictvím


X-ray of SQL Server timeout and user cancel

Hi Everyone,

We're going to take a close look at how SSAS and SQL Server implement execution timeout and user cancellation, and we'll discover a few noticeable differences.

Today we'll check SQL Server.

SQL Server Timeout

For this test I've executed a simple "waitfor delay '00:00:15'" that will ... wait for 15 s.

WIth an execution timeout set to 5 in SSMS properties, this is a sure and easy way to see a timeout.

Here's how it appears in Profiler :

Notice we get a BatchCompleted first and then the Attention event. Which here has a duration of 0.

Now let's take a look at Network level using a Netmon trace :


Frame 96 is the batch sent by the client, and based on frame 110, it would seem that the activity that triggers after 5s is originating from the client : so the time-out is managed client-side and implemented by a ‘TDS:Attention’ message.

Frame 111 is the end of batch with error, and frame 112 is the response to the attention from server to client.

Because the time-out is client side, if the client station is in a ‘bad shape’ somehow, the time-out may not fire when expected. I simulated this by freezing Management Studio using a debugger : in that case we see the query finishing successfully on the server even though the time-out has expired (note that because it is a waitfor delay there’s no resultset to fetch). This also means that the network latency takes place before the server can know about the time-out.

SQL Server Cancel

Let’s now look at the ‘cancel’ implementation. By cancel I mean the red square button in Sql Server Management Studio (SSMS). Same test, this time using a longer waitfor 45s, manually cancelled after 20s :

In profiler :


Let’s take a look at network level :

 

 Frame 159 is my user batch.

Frame 211 is my manual cancel.

Again, two responses from server, first the response to batch with status=error and the attention response.

So it seems we have exactly the same implementation for time-out and manual cancel. This is another confirmation that this is fully managed client-side as the server had no idea I was about to cancel the query after 20s (user mind reading is not planned before a few releases of SQL Server :) ).

Cancel or Time-out of a statement with large transaction history.

So far we have played with a simple waitfor query. Canceling such a query is a breeze. But now let’s play with a statement that will have a massive transactional history.

To simulate this I’ve done the following :

  • Prepare what’s needed for a single massive insert statement :
    • Create table mytable (i int identity, n nchar(4000))
    • Insert into mytable values (‘foo’)
    • Go 40000
    • Create table mytable2(i int, n nchar(4000))
  • And the statement (takes 10-20s)
    • Insert mytable2 select * from mytable

Let’s interrupt that one by time-out or cancel (we know now it’s the same thing).

Profiler :

Ok same as before… but no, look at the Duration for attention, it’s not 0 anymore, and Attention Starttime and EndTime have different values. And where does the attention starttime come from ? And why is the endtime of batch completed the same as attention endtime ?

Let’s take a look at netmon level to better understand :

The trace started a millisecond too late and missed the initial statement which happens at timeoffset 0.

Frame 37 is the TDS:Attention, in other words, the client cancels the query after roughly 10s.

Hey frame37's time matches the starttime of the Attention event ! So that’s the place where SQL profiler reflects the information.

But look at the time before the server responds to the cancel : 45s later ! And we see our usual twin responses for the batch and the attention.

What took place should be no surprise : SQL Server needs to rollback the query upon cancellation, and because the query has actually done a fair bit of insert work, undoing it takes time (I encourage you to compare the execution time and rollback time, and remember it for the day you’re about to kill a long running ‘write’ query).

So we’ll complete the documentation about the Attention even as follows :

Event

Place where information is recorded

Query start

SQL:batch starting (or RPC)

Client initiates cancel

Attention Start time

End of Rollback/Clean-up

Attention End Time and SQL Batch Completed

So the SQL:Batch Complete end time and duration reflect the query execution duration and the rollback duration. To know the actual duration of query execution only, you need to substract the attention duration from it. In that case, 55s minus 45.5s  equals about 9.5s. This is important if identifying the time-out value matters as it won’t appear directly in profiler trace information !

Time-outs for select are much easier to spot as their execution time is often 30xxx ms for the ubiquitous 30s timeout (you'll never get a round 30000 ms )

Now why isn’t the SQL Batch completed event fired when cancellation happens ? Well even if the batch itself is not executing anymore, its rollback is taking place, and SQL Server is busy and not ready to accept a new batch against the current connection (in non-MARS context). This reflects in the fact that in SSMS the ‘green arrow’ taskbar button won’t be activated until the client receives the batch response from the server. Which may take some time…

 Here I hope this was useful. Next post will be the same analysis but with SSAS this time. Stay tuned !

Guillaume Fourrat