Figure out variable lifetime using SOS
There is a bug in this program below, try to see if you could catch it.
Test.cs (compiled to DelegateExample.exe):
using System;
using System.Threading;
using System.Runtime.InteropServices;class Test
{
delegate uint ThreadProc (IntPtr arg);private uint m;
public Test (uint n)
{
m = n;
}uint Reflect (IntPtr arg)
{
Console.WriteLine (m);
return m;
}static void Main ()
{
Test t = new Test (1);
ThreadProc tp = new ThreadProc (t.Reflect);
NewThread (tp);
Thread.Sleep (1000);
}[DllImport("UsingCallback")]
static extern void NewThread (ThreadProc proc);
}
UsingCallback.cpp (compiled to UsingCallback.dll):
_stdcall void NewThread (LPTHREAD_START_ROUTINE cb)
{
DWORD id = 0;
CreateThread (NULL, 0, cb, NULL, 0,&id);
}
Yes, here is the problem: in the cs file, the managed code passes a Delegate object to unmanaged code which will create a new thread to call the delegate. Since unmanaged code has no way to tell CLR how it plans to use the object, from CLR's point of view, there are no live roots for the Delegate object after the line “NewThread (tp); ”. Thus the object is eligible to be garbage collected (GC) after the call, even if the new thread might not start yet. So it's possible for the Delegate to become trash before unmanaged code invokes it and cause unspecified failure. A fix is to add GC.KeepAlive before Main returns:
Test t = new Test (1);
ThreadProc tp = new ThreadProc (t.Reflect);
NewThread (tp);
Thread.Sleep (1000);
GC.KeepAlive (tp);
One thing annoying about this kind of bug is that GC is nondeterministic, the program could work just fine 99% of time, but only crashes under stress situation. Another thing make the problem hard to find is that it's not very intuitive by looking at the source. You might know the theory that when a variable is not used anymore, it will not be reported as a live root; but it would take quite some time to figure out at which point which variable is dead, plus there's no way to verify if CLR agrees with your analysis.
I'll show you how to use SOS.dll to check the internal data structure used by CLR to determine variable lifetime. When JIT compiles code, it generates such variable aliveness information (GC info) for each method and saves it along with the machine code of the method. When GC happens, it will check GC info for every method in the stack to find out which variable is alive and will use the live variables as object roots. GC info is highly compacted, but SOS.dll has “!GCInfo” command to crack it and show it in a human-readable way. This approach only works in assembly level, so stop reading if you are not interested in assembly language. :)
I compiled test.cs above using Visual studio to a "Debug" build, and launched the program under Windbg. After the Main method is JITted, I could disassemly the generated native code of the method using "!SOS.u" command:
0:000> !u 02f00058
Will print '>>> ' at address: 02f00058
Normal JIT generated code
[DEFAULT] Void Test.Main()
Begin 02f00058, size 60
>>> 02f00058 55 push ebp
02f00059 8bec mov ebp,esp
02f0005b 83ec08 sub esp,0x8
02f0005e 57 push edi
02f0005f 56 push esi
02f00060 53 push ebx
02f00061 33ff xor edi,edi
02f00063 33db xor ebx,ebx
02f00065 b9e850ad00 mov ecx,0xad50e8 (MT: Test)
02f0006a e8a91fbcfd call 00ac2018 (JitHelp: nc)
02f0006f 8bf0 mov esi,eax
02f00071 8bce mov ecx,esi
02f00073 ba01000000 mov edx,0x1
02f00078 ff152051ad00 call dword ptr [00ad5120] (Test..ctor)
02f0007e 8bfe mov edi,esi
02f00080 b9c451ad00 mov ecx,0xad51c4 (MT: Test/ThreadProc)
02f00085 e88e1fbcfd call 00ac2018 (JitHelp: nc)
02f0008a 8bf0 mov esi,eax
02f0008c 689350ad00 push 0xad5093
02f00091 8bd7 mov edx,edi
02f00093 8bce mov ecx,esi
02f00095 ff152452ad00 call dword ptr [00ad5224] (Test/ThreadProc..ctor)
02f0009b 8bde mov ebx,esi
02f0009d 8bcb mov ecx,ebx
02f0009f ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)
02f000a5 b9e8030000 mov ecx,0x3e8
02f000aa ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02f000b0 90 nop
02f000b1 5b pop ebx
02f000b2 5e pop esi
02f000b3 5f pop edi
02f000b4 8be5 mov esp,ebp
02f000b6 5d pop ebp
02f000b7 c3 ret
SOS's "!u" is similar to Windbg's "u", but it shows more data because managed code is self-describable. For example, for those indirect calls, if the target is a managed function, "!u" could tell us the function name.
To check its GC info, we need to get the method's MethodDesc first (Method descriptor, CLR's data structure to keep all information about one method). We could use "!ip2md" to find out the method desc from any instruction pointer in the method:
0:000> !ip2md 02f00058
MethodDesc: 0x00ad50a8
Jitted by normal JIT
Method Name : [DEFAULT] Void Test.Main()
MethodTable ad50e8
Module: 151ad0
mdToken: 06000003 (D:\projects\DelegateExample\bin\Debug\DelegateExample.exe)
Flags : 10
Method VA : 02f00058
This command shows some important information about the method. To check GC info, we only need to pass the MethodDesc pointer itself to “!GCInfo”:
0:000> !gcinfo 0x00ad50a8
Normal JIT generated code
Method info block:
method size = 0060
prolog size = 9
epilog size = 7
epilog count = 1
epilog end = yes
saved reg. mask = 000F
ebp frame = yes
fully interruptible=yes
double align = no
security check = no
exception handlers = no
local alloc = no
edit & continue = yes
varargs = no
argument count = 0
stack frame size = 2
untracked count = 0
var ptr tab count = 0
epilog at 0059
60 E5 C0 45 |
Pointer table:
F0 7B | 000B reg EDI becoming live
5A | 000D reg EBX becoming live
F0 42 | 0017 reg EAX becoming live
72 | 0019 reg ESI becoming live
4A | 001B reg ECX becoming live
F0 03 | 0026 reg EAX becoming dead
08 | 0026 reg ECX becoming dead
F0 44 | 0032 reg EAX becoming live
30 | 0032 reg ESI becoming dead
72 | 0034 reg ESI becoming live
57 | 003B reg EDX becoming live
4A | 003D reg ECX becoming live
06 | 0043 reg EAX becoming dead
08 | 0043 reg ECX becoming dead
10 | 0043 reg EDX becoming dead
4C | 0047 reg ECX becoming live
0E | 004D reg ECX becoming dead
30 | 004D reg ESI becoming dead
F1 1B | 0060 reg EBX becoming dead
38 | 0060 reg EDI becoming dead
FF |
Output of the command has 2 sections, the first part is method info block, which contains some basic information about the JITted code, like size of the method, size of the prolog, and etc. The 2nd part is pointer table, on which I'll spend most of time. Pointer table describes lifetime of every GC reference inside the method. It has 3 columns, the first one is byte encodings, which we don't need to care about; the second column is the offset in the JITted code. E.g, this method's code starts from 02f00058, so 000B means the instruction at 02f00058+B = 2F00063, “xor ebx,ebx”; the third column tells us change of lifetime for a GC pointer at that instruction. E,g, “000B reg EDI becoming live” means starts from 2F00063, register EDI is a live root; similiarly, we can see EDI becomes dead at offset 60, end of the method (0060 reg EDI becoming dead). So whatever variable EDI is used to store, its lifetime is from beginning to end of the method. To understand the pointer table, it's better to interweave the table with the JITted code. In Whidbey, SOS has "!u -gcinfo" to do the job; but for Everett, I have to manually put them together and show it along with the source code:
02f00058 55 push ebp
02f00059 8bec mov ebp,esp
02f0005b 83ec08 sub esp,0x8
02f0005e 57 push edi
02f0005f 56 push esi
02f00060 53 push ebx
02f00061 33ff xor edi,edi
GCInfo: 000B reg EDI becoming live
02f00063 33db xor ebx,ebx
GCInfo: 000D reg EBX becoming liveTest t = new Test (1);
02f00065 b9e850ad00 mov ecx,0xad50e8 (MT: Test)
02f0006a e8a91fbcfd call 00ac2018 (JitHelp: nc)
GCInfo: 0017 reg EAX becoming live
02f0006f 8bf0 mov esi,eax
GCInfo: 0019 reg ESI becoming live
02f00071 8bce mov ecx,esi
GCInfo: 001B reg ECX becoming live
02f00073 ba01000000 mov edx,0x1
02f00078 ff152051ad00 call dword ptr [00ad5120] (Test..ctor)
GCInfo: 0026 reg EAX becoming dead
GCInfo: 0026 reg ECX becoming dead
02f0007e 8bfe mov edi,esiThreadProc tp = new ThreadProc (t.Reflect);
02f00080 b9c451ad00 mov ecx,0xad51c4 (MT: Test/ThreadProc)
02f00085 e88e1fbcfd call 00ac2018 (JitHelp: nc)
GCInfo: 0032 reg EAX becoming live
GCInfo: 0032 reg ESI becoming dead
02f0008a 8bf0 mov esi,eax
GCInfo: 0034 reg ESI becoming live
02f0008c 689350ad00 push 0xad5093
02f00091 8bd7 mov edx,edi
GCInfo: 003B reg EDX becoming live
02f00093 8bce mov ecx,esi
GCInfo: 003D reg ECX becoming live
02f00095 ff152452ad00 call dword ptr [00ad5224] (Test/ThreadProc..ctor)
GCInfo: 0043 reg EAX becoming dead
GCInfo: 0043 reg ECX becoming dead
GCInfo: 0043 reg EDX becoming deadNewThread (tp);
02f0009b 8bde mov ebx,esi
02f0009d 8bcb mov ecx,ebx
GCInfo: 0047 reg ECX becoming live
02f0009f ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)
GCInfo: 004D reg ECX becoming dead
GCInfo: 004D reg ESI becoming deadThread.Sleep (1000);
02f000a5 b9e8030000 mov ecx,0x3e8
02f000aa ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02f000b0 90 nop
02f000b1 5b pop ebx
02f000b2 5e pop esi
02f000b3 5f pop edi
02f000b4 8be5 mov esp,ebp
02f000b6 5d pop ebp
02f000b7 c3 retGCInfo: 0060 reg EBX becoming dead
GCInfo: 0060 reg EDI becoming dead
Let's assume a thread triggers GC when another thread is excuting code 02f0009d. From the table, we know at this time, register ESI, EBX, and EDI will be reported as live roots to GC. With some disassembly, we could see at this moment both ESI and EBX contain reference to Delegate tp, and EDI is variable t. It might surprise you that both EBX and EDI keep alive until the function ends. That means variable t and tp are actually alive for the whole function, so the code has no bug?!
The tricky point is that a variable is eligible to be dead once it's not used anymore. However, it's up to JIT to determine whether it really wants to report the variable to be dead. In fact, for debuggable code, JIT extends lifetime for every variable to end of the function.
To prove the premature GC bug really exists in the sample, I compiled it to "Release" build and repeated all the steps above:
0:000> !u 02df0058
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorwks.dll"
Will print '>>> ' at address: 02df0058
Normal JIT generated code
[DEFAULT] Void Test.Main()
Begin 02df0058, size 46
>>> 02df0058 57 push edi
02df0059 56 push esi
02df005a b9e850ad00 mov ecx,0xad50e8 (MT: Test)
02df005f e8b41fcdfd call 00ac2018 (JitHelp: nc)
02df0064 8bf0 mov esi,eax
02df0066 c7460401000000 mov dword ptr [esi+0x4],0x1
02df006d b9bc51ad00 mov ecx,0xad51bc (MT: Test/ThreadProc)
02df0072 e8a11fcdfd call 00ac2018 (JitHelp: nc)
02df0077 8bf8 mov edi,eax
02df0079 689350ad00 push 0xad5093
02df007e 8bd6 mov edx,esi
02df0080 8bcf mov ecx,edi
02df0082 ff151c52ad00 call dword ptr [00ad521c] (Test/ThreadProc..ctor)
02df0088 8bcf mov ecx,edi
02df008a ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)
02df0090 b9e8030000 mov ecx,0x3e8
02df0095 ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02df009b 5e pop esi
02df009c 5f pop edi
02df009d c3 ret0:000> !ip2md 02df0058
MethodDesc: 0x00ad50a8
Jitted by normal JIT
Method Name : [DEFAULT] Void Test.Main()
MethodTable ad50e8
Module: 151ad0
mdToken: 06000003 (D:\projects\DelegateExample\bin\Release\DelegateExample.exe)
Flags : 10
Method VA : 02df00580:000> !gcinfo 0x00ad50a8
Normal JIT generated code
Method info block:
method size = 0046
prolog size = 2
epilog size = 3
epilog count = 1
epilog end = yes
saved reg. mask = 0003
ebp frame = no
fully interruptible=no
double align = no
security check = no
exception handlers = no
local alloc = no
edit & continue = no
varargs = no
argument count = 0
stack frame size = 0
untracked count = 0
var ptr tab count = 0
epilog at 0043
46 21 |Pointer table:
A9 | 001F call 0 [ ESI ]
07 | 0026 push
CB | 0030 call 1 [ EDI ]
FF |
Here is mixed source code, native code and pointer table:
02df0058 57 push edi
02df0059 56 push esiTest t = new Test (1);
02df005a b9e850ad00 mov ecx,0xad50e8 (MT: Test)
02df005f e8b41fcdfd call 00ac2018 (JitHelp: nc)
02df0064 8bf0 mov esi,eax
02df0066 c7460401000000 mov dword ptr [esi+0x4],0x1ThreadProc tp = new ThreadProc (t.Reflect);
02df006d b9bc51ad00 mov ecx,0xad51bc (MT: Test/ThreadProc)
02df0072 e8a11fcdfd call 00ac2018 (JitHelp: nc)
GCInfo: 001F call 0 [ ESI ]
02df0077 8bf8 mov edi,eax
02df0079 689350ad00 push 0xad5093
GCInfo: 0026 push
02df007e 8bd6 mov edx,esi
02df0080 8bcf mov ecx,edi
02df0082 ff151c52ad00 call dword ptr [00ad521c] (Test/ThreadProc..ctor)
GCInfo: 0030 call 1 [ EDI ]EM>NewThread (tp);
02df0088 8bcf mov ecx,edi
02df008a ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)Thread.Sleep (1000);
02df0090 b9e8030000 mov ecx,0x3e8
02df0095 ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02df009b 5e pop esi
02df009c 5f pop edi
02df009d c3 ret
In release build, the JIIted code is smaller, and the pointer table in GC info is much smaller. I need to explain some new syntax in this pointer table: “call 0 [ESI]” means the method calls a function with 0 argument, and ESI is a live variable at this point; "push" just indicates change of the stack, which is important information for GC to unwind this frame, but doesn't affect pointer aliveness.
One thing interesting about this new pointer table is that it only reports GC references when the method calls into other methods. What if a GC happens somewhere else? For example, at line 02df0079, EDI contains object tp and ESI contains object t, if a GC happens while a thread is excuting 02df0079 but GCInfo doesn't report those two variables, will they be collected? The answer is GC can't happen at that place. There is an important field in method info block called "fully interruptable". In debug version, this field for the method is true but in release version the field is false. A method is fully interruptable means GC could stop a thread (and perform collection) at any point if the thread is executing this method. If a method is not fully interruptable, GC can't start at arbitrary point if a thread is executing this method. It has to wait until a point when the thread returns from a call (e.g, return from ThreadProc's constructor back to Main) or when the method calls into another method which allows GC to happen (calls into unmanaged code via PInvoke always allow GC to happen). That's why we only need to report varaibles at calls. For performance reason, such trivial methods like Test.Main is usually non-fully interruptable in release build.
With all the knowledge, now we know that if a GC happens when a thread is just returning from the call "new ThreadProc" at instruction 02df0072, GC knows ESI (variable t) is a live root; if a GC happens when a thread is just returning from the call to ThreadProc's constructor at 02df0082, GC knows EDI (variable tp) is a live root. Variable t is not reported this time(but it's kept alive by object tp). However if a GC happens when a thread returns from the call to NewThread at 02df008a, neither t nor tp will be reported so they could be collected by GC. If the new thread hasn't start then, it will have trouble when using the objects.
So next time if you suspect there might be some premature GC bug, you could try "!SOS.GCInfo" to see how exactly CLR thinks about the GC object's lifetime.
This posting is provided "AS IS" with no warranties, and confers no rights.
Comments
Anonymous
July 27, 2005
In questo
caso SOS sta per Son Of
Strike.
Alcuni
debugger (tra gli altri CDB e WinDBG)...Anonymous
January 26, 2007
In questo caso SOS sta per S on O f S trike. Alcuni debugger (tra gli altri CDB e WinDBG) espongono unAnonymous
October 12, 2008
PingBack from http://test24a.wordpress.com/2007/12/03/where-did-my-object-go-part-2/