Finding which queries were executing from a SQL Memory Dump – revisited
A message about the previous post on finding queries in a SQL Memory Dump file:
Back in October I did a post on finding queries in SQL Server Memory Dumps. It has been a popular topic, but some have had difficulties following it. After reviewing this with a reader, I realized there was an issue with this post for 64 bit dumps. Rather than make corrections, I decided I’d do another example here so we can learn more from the issue with that post and use this opportunity to view another dump.
The dump analysis detailed in that post worked as described in the post. However, in my attempt to convert the necessary steps taken from using internal symbols to something that can be applied using public symbols (dumping the right addresses with the right offsets), I mistakenly used the ‘dd’ command. The ‘dd’ command dumps what we call double words. Double words are 32 bits – or 32 bit addresses. The architecture of the system used for that post is 64 bit. Ironically this worked fine and went undetected in my previous example because the higher order digits were all zero – likely because my test box’s memory had not been stressed enough to use any higher range memory addresses. Therefore the lower 8 digits in the hex number (the right half) representing our memory addresses worked with ‘dd’ because the bottom half is 32 bits (0xFFFFFFFF) and the leading zeros in the higher 32 bits were not significant. Then when applying this method to a system that was under greater memory pressure, we begin to see those higher ordered digits on a 64 bit machine being used. The ‘dd’ doesn’t work for this. So, we’ll now use ‘dq’ to dump quad words. This will accurately display our 64 bit memory addresses and work more reliably on 64 architecture. The previous post would work fine for 32 bit architecture. Remember, we only need to use ‘dq’ with 64 bit addresses. Otherwise, ‘dd’ is appropriate for 32 bit addresses.
Finding your query in a dump file:
I’d like to thank the read mentioned above, Dylan, for allowing me to write this blog post using his SQL Memory dump as a sample (so we’ll have the higher addresses ranges in use).
But first the basics, we first setup our symbols. Once this is done, I see this dump is a “symptom” or “SPID” mini-dump in that it contains only the thread that encountered the problem:
0:000> ~
. 0 Id: 12a8.1870 Suspend: 0 Teb: 000007ff`fff0a000 Unfrozen
If we view the stack trace of that thread to see what was going on:
0:000> kc
Call Site
ntdll!ZwWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
sqlservr!CDmpDump::DumpInternal
sqlservr!CDmpDump::DumpFilter
sqlservr!CDmpDump::Dump
msvcr80!__C_specific_handler
ntdll!RtlpExecuteHandlerForException
ntdll!RtlDispatchException
ntdll!RtlRaiseException
KERNELBASE!RaiseException
sqlservr!CDmpDump::Dump
sqlservr!SQLDumperLibraryInvoke
sqlservr!CImageHelper::DoMiniDump
sqlservr!stackTrace
sqlservr!utassert_fail
sqlservr!FullXactImp::Rollback
sqlservr!CMsqlXactInternalBase::RollbackInternal
sqlservr!CMsqlXactInternalReadWrite::Rollback
sqlservr!CMsqlXactImp::RollbackInternal
sqlservr!CMsqlXactImp::Rollback
sqlservr!CMsqlXact::UnInstallUserXact
sqlservr!process_request
sqlservr!process_commands
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!_callthreadstartex
msvcr80!_threadstartex
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
We can see that we hit an assertion while SQL Server was attempting to roll back a transaction. Assertions are safety checks in code to verify that a condition that shouldn’t happen - isn’t happening. If that condition occurs (or the safety check fails), an error is generated like we see above. So we want to get two things out of this – what is the assertion and what is the query that was running? The latter is mostly for academic purposes here. In general, you cannot write TSQL statements that cause assertions or corruption in databases. TSQL just isn’t low-level enough to do that. However, the TSQL you run may lead to the scenario which causes us to come across an issue like this. Therefore, getting the query text is good to see if we can reproduce the issue. If we use some different parameters for the ‘k’ (stack trace) command, we get more information returned. When we run ‘kvnf’, we notice that we get the memory taken up by each frame on the stack (a frame is one line on the stack). If we look at the frame that called sqlservr!utassert_fail, we see that it’s memory usage is rather large compared to other frames. This is a good indication something is stored on the stack as opposed to the heap or in memory allocated via VirtualAlloc() calls. Here is that frame from the ‘kvnf’ command:
05 1550 00000000`55addca0 00000000`00fb23fd : 00000000`00000000 00000000`ffffffff 00000003`00000000 0000b688`75b7282f : sqlservr!utassert_fail+0x844
You’ll notice it uses 1550 bytes of memory – much more than the other frames. To investigate, we start dumping at the child base stack pointer – the Child-SP. This is where this frame starts on the stack in memory. I am going to dump Unicode data, so I’ll use the ‘du’ command:
0:000> du 00000000`55addca0
00000000`55addca0 "¸"
0:000> du
00000000`55addca4 ""
Type ‘du’ twice and then you can simply hit the ENTER key and it will increment automatically. Keep hitting ENTER and you’ll eventually find the assertion stored on the stack:
0:000>
00000000`55addcf0 "Location: "xact.cpp":4362.Expre"
00000000`55addd30 "ssion: !m_parNestedXactCnt.SPID"
00000000`55addd70 ": 84.Process ID: 4776.Descrip"
00000000`55adddb0 "tion: Trying to use the transac"
00000000`55adddf0 "tion while there are 1 parallel "
00000000`55adde30 "nested xacts outstanding"
Now that we have our assertion, we can search the online support knowledge base to see if there are any known issues (https://support.microsoft.com/).
However, we still have one thing left to get from the dump file – our query. Using the ‘dq’ command, we can follow the same general process as the previous post did. The first thing we notice is that this thread doesn’t show a call to sqlservr!CMSqlExecContext::ExecuteStmts like the previous post. That’s ok. What we are looking for is the address of a particular object that is passed to a few calls on the stack. In this dump, we’ll start with the call to sqlservr!process_request. From this call, we need to start with the first parameter – the address to the object we are interested in:
0:000> dq 00000007`5ae70fa0+0x20 l1
00000007`5ae70fc0 00000007`5ae70ea0
We dump a quad-word (‘dq’) with an offset of 0x20. The ‘l1’ tells the debugger we only want 1 quad word. Notice the higher order digits (the digits to the left of the apostrophe are not all zero – the issue with the previous post).
Now, we need to take the address returned (the address of another object – which our first object points to) and dump that with an offset of 0x28 using ‘dq’.
0:000> dq 00000007`5ae70ea0+0x28 l1
00000007`5ae70ec8 00000009`91862160
We aren’t there yet. This address (another object in memory), has what we need at an offset of 0x200. So let’s get that:
0:000> dq 00000009`91862160+0x200 l1
00000009`91862360 00000003`408fdd30
In our chain of objects, this one has what we need at an offset of 0x20:
0:000> dq 00000003`408fdd30+0x20 l1
00000003`408fdd50 00000003`408fdd90
Finally, this address holds the pointer we want to get to:
0:000> dq 00000003`408fdd90 l1
00000003`408fdd90 00000003`408fde20
Which gives us the location of the buffer we’ve been after. So now we dump Unicode with ‘du’ to get the statement:
0:000> du 00000003`408fde20
00000003`408fde20 "(@P0 int)select count(*) as y0_ "
00000003`408fde60 "from dbo.V_PDM_CP this_ where th"
00000003`408fdea0 "is_.ResItNum= @P0 ??????????????"
00000003`408fdee0 "????????????????????????????????"
00000003`408fdf20 "????????????????????????????????"
From here we can see we have a parameterized SQL statement. This statement either went through auto-parameterization or was submitted as a parameterized statement via a call to sp_executesql. Now, we can move forward with researching and seeing if this is reproducible.
The approach is the same, but use ‘dq’ when dumping addresses in 64 bit SQL Memory Dumps.
-Jay
Comments
- Anonymous
September 29, 2011
Firstly, thank you very much for this post. I was able to view the sql stmt from dump.Question: Because the sequence of calls is not going to be same in all the cases. Like here you traced from process_request to the sql text. Is there any structure or ruleset one can follow to do those offsets to get to the sql text.What i mean is that, how do you know what offset to use and how many hops to go till you dump the buffer ? - Anonymous
September 29, 2011
Unfortunately this method can change with an update to SQL Server. I would say it is unlikely to change with hotfixes, cumulative updates, and service packs - but that isn't guaranteed. It is much more likely to change across versions. The reason I did it this way was so that users would be able to get the statement. There are very likely other paths to get to it - and if I find a shorter way I'll be sure to post it. The reason for all the offsets is that I got it by following what we call private symbols - symbol files that aren't available outside of Microsoft. When I view the dump, I see the name of the internal object. Following those is much easier. Externally you only have visibility to public symbols - usually just the function names. However, given that you are on the same version these objects will be the same - meaning the offsets are the same. We just need a starting point (address) and we can follow a list of relative directions from that point. - Anonymous
October 03, 2011
In extended events, i can create a memory dump on a deadlock-ed transactions situation. Can one possibly figure out deadlocked transactions? What those queries were? from the memory dump.Of course one can see the involved queries in the deadlock graph from the extended events or from sql profiler. But i am interested in finding the queries whichI am after finding the queries executed in a transaction before the ones which got deadlocked. - Anonymous
October 05, 2011
The comment has been removed - Anonymous
October 01, 2013
Hi Jay,Thanks for the post. I have encountered a SQL dump and used your instructions to determine the cause, however the output produced from the kc doesn't really give any clear indication as to what caused the problem. Could you please advise if you have any information on the code. I'm running SQL Server 2008 SP1 on Windows Server 2008.0:000> kc *** Stack trace for last set context - .thread/.cxr resets itCall Sitekernel32!RaiseExceptionsqlservr!CDmpDump::Dumpsqlservr!SQLDumperLibraryInvokesqlservr!CImageHelper::DoMiniDumpsqlservr!stackTracesqlservr!stackTraceCallBacksqlservr!ex_terminatorsqlservr!SOS_SEHTranslatormsvcr80!CallSETranslator0x00x0I have used the kvnf and looked through most of the output produced using dq as describe, but couldn't find any valuable information.Thanks - Anonymous
October 02, 2013
The frames you pasted here just show SQL Server handling the exception. You need what is below this. It looks like symbols are not lined up correctly from the 0x0. There is some info on symbols at the bottom of this post:blogs.msdn.com/.../basic-debugging-concepts-and-setup.aspx - Anonymous
October 29, 2013
Hi James,Thanks for the awesome post. But can you please elaborate or explain more on this taking the offset. I have tried different dumps from both SQL2005/2008 but I am unable to get the resultant query. Always i end up with questions marks during the steps. Please help me out.May be according to your post, I spent around 15-20 hours for getting a code from dump but couldnt get it..may be due to offset issues...please help...Or please mail me at hemanthtarra@hotmail.com..ThanksHemanth - Anonymous
October 29, 2013
The offsets are necessary to do this using public symbols. When debugging internally, we can use private symbols that make this a bit easier. The offsets are relative movements in memory to the correct structure or object that you need to dump. If you are getting ?????, this is often an indication that the SQL Query was not captured in the dump. The ????? tell us that this address range is not available in the dump file. This is common with mini-dumps. I looked at one just last night that had the same issue.Also, please be aware that the above steps were performed on SQL Server 2008. Over time, the code changes, which changes the size and location of objects in memory - changing the offsets. Therefore, this could be an issue for you as well.Jay