SQL Server: Stack dump
A stack dump is a file that is written to disk with an “.mdmp” extension when SQL Server encounters a running condition that it does not have a built-in error to handle. It contains the call stacks and memory information of the threads running when this situation happened and it’s done so that it can be used for troubleshooting after the fact.
The reason for this type of error condition to happen varies quite a bit, and it can range from CPU scheduling problems, issues with RAM chips, very heavy IO latency, linked server drivers, etc. In the worst case, the stack dumps can cause serious system instability and can trigger cluster failovers or SQL Server service restarts. In this blog post, we’re going to focus on analyzing the somewhat common condition of the Non-Yielding scheduler.
Dumps type:
Mini Dump:- This is a kind of dump in which we get very basic information about SQL server thread stack and very limited variables. It doesn’t include any indepth information or any data/index pages. The size of these dumps is very small and don’t even take much of resources while creation. These are the dumps enabled by default in SQL server.
Filtered Dump : – This a kind of dump in which we get more in depth information. It contains all the details of structures,functions including the details of all the stolen buffers. The size of these dumps is moderate and in 64 bit systems the size of these dumps depend on max server memory(It can reach in multiple GBs). To get the SQL server filtered dump we need to enable the trace flag 2551 using command dbcc traceon(2551,-1)
Full Dump : – This is a kind of dump in which we get the dump of entire SQL server process memory. It contains both the hashed and stolen pages(pages except index or data pages). The size of the dump file is equal to the total memory consumed by SQL server. On 64 bit systems, it reaches in multiple GBs in size. To get the full dumps we need to enable the trace flag using command dbcc traceon(2544,-1)
Stack Dump contains:
Each SQL Dump file has three kinds of files.
- SQLDump<nnnn>.txt
- SQLDump<nnnn>.log
- SQLDump<nnnn>.mdmp
Where nnnn is a number.
First two files can be opened using notepad (or any other text editor). Here is little information about them.
- TXT file will have information like below.
This file is generated by Microsoft SQL Server
version 13.0.4206.0
upon detection of fatal unexpected error. Please return this file,
the query or program that produced the bugcheck, the database and
the error log, and any other pertinent information with a Service Request.
Computer type is Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz.
Bios Version is VRTUAL – 5001223
BIOS Date: 05/23/12 17:15:53 Ver: 09.00.06
4 X64 level 8664, 10 Mhz processor (s).
Windows NT 6.2 Build 9200 CSD .
It would also have details about what caused it.
- File with “LOG” extension is a snippet of ERRORLOG file when the dump was generated. This would help someone to look at exactly Errorlog part when the dump was generated.
- File with “MDMP” extension (called as minidump file) is a small memory footprint of the SQL process when the issue happened. There are blogs on the internet which can tell how to read them and I am not going to explain that in this blog. The Microsoft support team has ways to read that file and get more details using private symbols.
Time I/O latency cause for stack dump
Memory pressure generate stack dump
Latch time out for stack dump:
When you encounter a latch timeout (buffer or non-buffer latch), the first occurrence of it’s type generates a mini-dump. If there are further occurrences of the same latch timeout, then that is reported as an error message in the SQL Errorlog.
Buffer latch timeouts are typically reported using Error: 844 and 845. The common reasons for such errors are documented in a KB Article. For a non-buffer latch timeout, you will get the an 847 error.
Error # | Error message template (from sys.messages) |
844 | Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait. |
845 | Time-out occurred while waiting for buffer latch type %d for page %S_PGID, database ID %d. |
846 | A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait. |
847 | Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait. |
Non-Yielding Scheduler issue:
Explanation :
Non-Yielding scheduler lead to SQL crash dump. This occurs due to CPU pressure.
Recommendation:
a. Create multiple data files for Tempdb (one per one scheduler) and move files to data files storage drive.
b. Enable Trace 1118 to reduce contention in temporary database
Tips:
If it is anything like below, then make sure you have applied the latest service pack and a cumulative update of SQL Server.
- Non-yielding Scheduler
- Non-yielding IOCP Listener
- Non-yielding Resource Monitor
- Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION
- A system assertion check has failed. Check the SQL Server error log for details.
If you have taken care of patching and CHECKDB is clean but still get dump files, then my advice would be to contact Microsoft SQL Support team. There are high chances that it might be an issue with SQL Server product and Microsoft team might fix it.
Trace flag:
You can use trace flags to change the flags that SQL Server passes to the utility in the context of an exception or in the context an assertion. These trace flags are in the range from 2540 to 2559. You can use these trace flags to generate certain types of dump files. For example:
Trace Flag 2551:
Produces a filtered memory dump.
Trace Flag 2544:
Produces a full memory dump.
Trace Flag 8026
: SQL Server will clear a dump trigger after generating the dump once