High memory utilization in dbghost.exe when using Debugdiag
I have seen lately many issues where Administrators and developers alike are claiming that Debugdiag is causing the whole server to become unresponsive.
In some situation, the debugger host (dbghost.exe) grows uncontrollably in memory consumption causing it to shutdown which will bring the target process the debugger was attached to to terminate as well.
In other situations, the debugger host (dbghost.exe) along with the target process seem to just be consuming all CPU cycles rendering the whole server unresponsive.
I looked into these issues to try to determine the root cause of these problems. And what I have noticed was something very interesting.
The first issue in which dbghost.exe seems to be leaking memory was actually an issue with .net 2.0 debugger extension (sos.dll). Here is a description of the problem and how to resolve it.
As you may have already known, Debugdiag relies mainly on sos.dll for .net debugging. So when you create a crash rule to catch a .net exception and run actions, dbghost.exe loads sos.dll and runs the "!dumpobj" command to determine the type of exception that was throws every time a .net exception is thrown. Below how the script for a crash rule determines the type of .net exception thrown.
1: Function GetCLRExceptionType(ByVal ExceptionObjHexAddr, ByVal bInnerException)
2: Dim Output, Lines, i
3:
4: If Debugger.IsClrExtensionMissing Then
5: WriteToLog "Unable to determine CLR exception type - extension dll could not be loaded."
6: Else
7: Output = Debugger.Execute("!DumpObj " & ExceptionObjHexAddr)
8: Lines = Split(Output, Chr(10))
9: For i = 0 To UBound(Lines)
10: If bInnerException Then
11: If InStr(Lines(i), "_innerException") <> 0 Then
12: Tokens = Split(Lines(i), " ")
13: For j = 0 To UBound(Tokens)
14: If Len(Tokens(j)) = 8 Then
15: GetCLRExceptionType = GetCLRExceptionType(Tokens(j), False)
16: Exit For
17: End If
18: Next
19: End If
20: ElseIf Len(Lines(i)) >= 7 Then
21: If InStr(Lines(i), "Name: ") = 1 Then
22: GetCLRExceptionType = Mid(Lines(i), 7)
23: Exit For
24: End If
25: End If
26: Next
27: End If
28: End Function
Line 7 is where the "dumpobj" command is ran against the exception object to determine the type.
There is a known issue with the "dumpobj" command that it leaks 4k every time it is ran. This leads to a situation where dbghost.exe runs out of memory and terminates.
There is an updated sos.dll that fixes a different issue but contains a fix for this problem. (https://support.microsoft.com/kb/967812). So all what needs to be done is to replace the existing sos.dll, and reactivate the crash rule.
The second issue has to do basically with the high rate of handled exceptions thrown by the target process (1st chance exceptions. It does not matter of the exceptions are managed or native.
For instance, if you look at the below log file generated for a crash rule, you see that there is a high number of exceptions thrown per sec.
1: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 2200
2: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.Threading.ThreadAbortException'
3: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 2200
4: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.Web.HttpContext'
5: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
6: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
7: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
8: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
9: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
10: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
11: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
12: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
13: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
14: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
15: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
16: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
17: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
18: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
19: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
20: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
21: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
22: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
23: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
24: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
25: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
26: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
27: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
28: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
29: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
30: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
31: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
32: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
33: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
34: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
35: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
36: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
37: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
38: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
39: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 2656
40: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
41: [1/27/2010 5:48:08 AM] First chance exception - 0xe0434f4d caused by thread with system id 6780
42: [1/27/2010 5:48:08 AM] CLR Exception Type - 'System.InvalidOperationException'
An average of around 22 exceptions/sec. Let look closely at what happens when an exception is raised:
From https://support.microsoft.com/default.aspx?scid=kb;EN-US;105675...
.... if the application is being debugged, the debugger sees all exceptions before the program does. This is the distinction between the first and second chance exception: the debugger gets the first chance to see the exception (hence the name). If the debugger allows the program execution to continue and does not handle the exception, the program will see the exception as usual. If the program does not handle the exception, the debugger gets a second chance to see the exception. In this latter case, the program normally would crash if the debugger were not present.
...
In other words, when an exception is raised, control is passed to the debugger, the debuggers acts on that exception and passes control back to the program to handle the exception. This is an expensive operation and involve a lot of context switching.
So, in this particular example, things are worse because for every CLR exception raised, debugdiag will run the command (!dumpObj ExceptionAddress ) to determine the type "GetCLRExceptionType(GetCLRExceptionPtr(CausingThread), False)". If sos.dll being used is the one with the leak issue, you will notice a high CPU utilization in both the debugger (dbghost.exe) and the target process along with a quick increase in memory utilization in dbghost.exe. This will eventually lead to an unexpected termination of the debugger and the target process.
In the case of native exceptions, the issue will manifest itself as a high CPU that may lead to an unresponsive server as a whole! To be able to debug such environment, you would need to know what are the exceptions that are thrown in a high rate and avoid them.
Comments
- Anonymous
July 16, 2010
The comment has been removed