.NET Just in Time Compilation and Warming up Your System

IMG_7823.jpg

One of the primary obstacle we face while scaling our system is Just In Time (JIT) compilation of the .NET stack. We run a .NET managed application server at a huge scale with many thousands of managed assemblies on each server (and many many thousands of those servers distributed globally).We deploy code daily and since those are managed code they are JITed at each deployment. Our system is very sensitive to latency and we do not want those servers getting new code to cost us execution latency. So we use various mechanisms to warm the servers before they start serving queries. Common techniques we use are

  1. NGEN
  2. Force JITing (a specialized multi-core JIT technique see bottom of post)
  3. Sending warmer queries that warm up the system

In this effort I frequently handle questions regarding how the system JITs managed methods. Even after so many years of the CLR JIT existing there seems to be confusion around when JIT happens, what is the unit of compilation. So I thought I’d make a quick post on this topic.

Consider the following simple code I have in One.cs

 using System;

namespace Foo
{
    class MyClass
    {
        public void a()
        {
            int i = 0;
            while(true)
            {
                b(i++); 
            }
        }

        public void b(int i) 
        {
            Console.WriteLine(i);
        }
    }

    class Program
    {
        static void Main()
        {
            MyClass mc = new MyClass();
            mc.a();
        }
    }
}

Of interest is the function Foo.MyClass.a and Foo.MyClass.b. We will debug to find out exactly when the later is JITed.

First I compile and then launch the debugger. I will use the windbg debugger and the sos extensions extensively in this post. Also read https://support2.microsoft.com/kb/319037?wa=wsignin1.0 to see how to setup symbol servers to debug into the CLR.

 csc /debug+ One.cs
windbg one.exe

After that in windbg I run the following command to setup

 .sympath+ d:\MySymbols          ;$$ Use the Microsoft symbol server (see link above)
sxe ld:clr                      ;$$ break on CLR loaded
g                               ;$$ continue the program until you break on CLR.dll being loaded
.loadby sos clr                 ;$$ load the sos debugger extension!bpmd One.exe Foo.MyClass.a     ;$$ Set a managed break point in a()
g                               ;$$ continue until break point in a() is hit

When this break point is hit, we have obviously already JITed MyClass.a() and executing it. The question we now have is that whether all the functions a() calls like MyClass.b() already JITed. If not when/how will that be JITed. Lets debug it!!

**Color coding indicates how I take output of one command to give inputs to the next one.

First lets find the this pointer for the MyClass instance. This can be obtained from the current managed call stack

 0:000> !clrstack -a
PARAMETERS:
        this (0x0000000000d9eea0) = 0x0000000002922c58

The details of the this object shows the MethodTable for it. The MethodTable has pointer to EEClass (cold data).

 0:000> !do 0x0000000002922c58
Name:        Foo.MyClass
MethodTable: 00007ffab2f640d8
EEClass:     00007ffab3072340
Size:        24(0x18) bytes
File:        d:\Skydrive\Code\C#\_JITPresentation\One.exe
Fields:

Now we can see more details of the MethodTable, which will show the individual methods descriptors.

 0:000> !dumpmt -md 00007ffab2f640d8
EEClass:         00007ffab3072340
Module:          00007ffab2f62fc8
Name:            Foo.MyClass
mdToken:         0000000002000002
File:            d:\Skydrive\Code\C#\_JITPresentation\One.exe
BaseSize:        0x18
ComponentSize:   0x0
Slots in VTable: 7
Number of IFaces in IFaceMap: 0
--------------------------------------
MethodDesc Table
Entry       MethodDesc    JIT Name
00007ffb07c16300 00007ffb077c80e8 PreJIT System.Object.ToString()
00007ffb07c5e760 00007ffb077c80f0 PreJIT System.Object.Equals(System.Object)
00007ffb07c61ad0 00007ffb077c8118 PreJIT System.Object.GetHashCode()
00007ffb07c5eb50 00007ffb077c8130 PreJIT System.Object.Finalize()
00007ffab3080120 00007ffab2f640d0    JIT Foo.MyClass..ctor()
00007ffab3080170 00007ffab2f640b0    JIT Foo.MyClass.a()
00007ffab2f6c050 00007ffab2f640c0   NONE Foo.MyClass.b(Int32)

The type has 7 methods. Also the out clearly indicates that Foo.MyClass.a() is JITed and Foo.MyClass.b() is NONE (or not JITed). We can get more details about these methods

 0:000> !dumpmd 00007ffab2f640b0
Method Name:  Foo.MyClass.a()
Class:        00007ffab3072340
MethodTable:  00007ffab2f640d8
mdToken:      0000000006000001
Module:       00007ffab2f62fc8
IsJitted:     yes
CodeAddr:     00007ffab3080170 <----- JITed
Transparency: Critical
i. 0:000> !dumpmd 00007ffab2f640c0
Method Name:  Foo.MyClass.b(Int32)
Class:        00007ffab3072340
MethodTable:  00007ffab2f640d8
mdToken:      0000000006000002
Module:       00007ffab2f62fc8
IsJitted:     no
CodeAddr:     ffffffffffffffff <----- Not yet JITed

So at this point we know that a() is JITed but the method b() it calls is not. In that the question arises that if it is not what is the content of the native instructions for a() and what does that code call into for b(). The disassembly will clearly show that the entire method a() is JITed and that for outward managed calls there are calls to stubs

 0:000> u 00007ffab3080170 L24
One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
00007ffa`b3080170 48894c2408      mov     qword ptr [rsp+8],rcx
00007ffa`b3080175 4883ec38        sub     rsp,38h
00007ffa`b3080179 c744242000000000 mov     dword ptr [rsp+20h],0
00007ffa`b3080181 c644242400      mov     byte ptr [rsp+24h],0
00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
00007ffa`b3080190 8b00            mov     eax,dword ptr [rax]
00007ffa`b3080192 85c0            test    eax,eax
00007ffa`b3080194 7405            je      One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
00007ffa`b3080196 e82574b25f      call    clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
00007ffa`b308019b 90              nop
00007ffa`b308019c c744242000000000 mov     dword ptr [rsp+20h],0
00007ffa`b30801a4 eb23            jmp     One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
00007ffa`b30801a6 90              nop
00007ffa`b30801a7 8b4c2420        mov     ecx,dword ptr [rsp+20h]
00007ffa`b30801ab ffc1            inc     ecx
00007ffa`b30801ad 8b442420        mov     eax,dword ptr [rsp+20h]
00007ffa`b30801b1 89442428        mov     dword ptr [rsp+28h],eax
00007ffa`b30801b5 894c2420        mov     dword ptr [rsp+20h],ecx
00007ffa`b30801b9 8b542428        mov     edx,dword ptr [rsp+28h]
00007ffa`b30801bd 488b4c2440      mov     rcx,qword ptr [rsp+40h]
00007ffa`b30801c2 e889beeeff      call    Foo.MyClass.b(Int32) (00007ffa`b2f6c050) 
00007ffa`b30801c7 90              nop<br>00007ffa`b30801c8 90              nop
00007ffa`b30801c9 c644242401      mov     byte ptr [rsp+24h],1
00007ffa`b30801ce ebd6            jmp     One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6)
00007ffa`b30801d0 90              nop
00007ffa`b30801d1 4883c438        add     rsp,38h
00007ffa`b30801d5 c3              ret
00007ffa`b30801d6 0000            add     byte ptr [rax],al
00007ffa`b30801d8 1909            sbb     dword ptr [rcx],ecx
00007ffa`b30801da 0100            add     dword ptr [rax],eax
00007ffa`b30801dc 096200          or      dword ptr [rdx],esp
 

So we see that for b a call is made to the memory location 00007ffa`b2f6c050. We can see what is there now by disassembling that address.

 0:000> u 00007ffa`b2f6c050
Foo.MyClass.b(Int32):
00007ffa`b2f6c050 e87b5e755f      call    clr!PrecodeFixupThunk (00007ffb`126c1ed0)
00007ffa`b2f6c055 5e              pop     rsi
00007ffa`b2f6c056 0201            add     al,byte ptr [rcx]

So basically instead of real native JITed code existing for b() there is actually a stub or thunk in it’s place. So we clearly establish that when a function is called it’s entire code is JITed and other method it calls is not yet JITed (however, there are caveats like inline methods etc). Now we can now go and set a breakpoint inside JIT to break when it tries it JIT the b() method. This is what we do

 0:000> bp clr!UnsafeJitFunction ;$$ entry point for JITing a method
0:000> g                        ;$$ continue executing until we hit the UnsafeJITFunction
0:000> k                        ;$$ dump the stack for JITing
clr!UnsafeJitFunction
clr!MethodDesc::MakeJitWorker
clr!MethodDesc::DoPrestub
clr!PreStubWorker+0x3d6
clr!ThePreStub+0x5a [f:\dd\ndp\clr\src\vm\amd64\ThePreStubAMD64.asm @ 92]
One!Foo.MyClass.a()+ 0x57 [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 12]

As we can see that the JITing actually happened in the same call thread that is executing a() and exactly when b was called. ThePreStub finally calls the JITer. The JITer will actually JIT the method b() and backtrack the stack and patch up the call, so that it will actually now be a call straight to the JITed copy of b(). We hit g couple of times and now see what happens for the MethodDescriptor for b()

 0:000> !dumpmd 00007ffab2f640c0
Method Name:  Foo.MyClass.b(Int32)
Class:        00007ffab3072340
MethodTable:  00007ffab2f640d8
mdToken:      0000000006000002
Module:       00007ffab2f62fc8
IsJitted:     yes
CodeAddr:     00007ffab30801f0  <-- Now it is JITed
Transparency: Critical

As we see b() is now JITed and we can see it’s disassembly as well. However, more interesting, lets go back and see what the disassembly of a() now contains

 0:000> u 00007ffab3080170 L24
One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
00007ffa`b3080170 48894c2408      mov     qword ptr [rsp+8],rcx
00007ffa`b3080175 4883ec38        sub     rsp,38h
00007ffa`b3080179 c744242000000000 mov     dword ptr [rsp+20h],0
00007ffa`b3080181 c644242400      mov     byte ptr [rsp+24h],0
00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
00007ffa`b3080190 8b00            mov     eax,dword ptr [rax]
00007ffa`b3080192 85c0            test    eax,eax
00007ffa`b3080194 7405            je      One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
00007ffa`b3080196 e82574b25f      call    clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
00007ffa`b308019b 90              nop
00007ffa`b308019c c744242000000000 mov     dword ptr [rsp+20h],0
00007ffa`b30801a4 eb23            jmp     One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
00007ffa`b30801a6 90              nop
00007ffa`b30801a7 8b4c2420        mov     ecx,dword ptr [rsp+20h]
00007ffa`b30801ab ffc1            inc     ecx
00007ffa`b30801ad 8b442420        mov     eax,dword ptr [rsp+20h]
00007ffa`b30801b1 89442428        mov     dword ptr [rsp+28h],eax
00007ffa`b30801b5 894c2420        mov     dword ptr [rsp+20h],ecx
00007ffa`b30801b9 8b542428        mov     edx,dword ptr [rsp+28h]
00007ffa`b30801bd 488b4c2440      mov     rcx,qword ptr [rsp+40h]
00007ffa`b30801c2 e889beeeff      call    Foo.MyClass.b(Int32) (00007ffa`b2f6c050)
00007ffa`b30801c7 90              nop
00007ffa`b30801c8 90              nop
00007ffa`b30801c9 c644242401      mov     byte ptr [rsp+24h],1
00007ffa`b30801ce ebd6            jmp     One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6)
00007ffa`b30801d0 90              nop
00007ffa`b30801d1 4883c438        add     rsp,38h
00007ffa`b30801d5 c3              ret
00007ffa`b30801d6 0000            add     byte ptr [rax],al
00007ffa`b30801d8 1909            sbb     dword ptr [rcx],ecx
00007ffa`b30801da 0100            add     dword ptr [rax],eax
00007ffa`b30801dc 096200          or      dword ptr [rdx],esp
00007ffa`b30801df 005600          add     byte ptr [rsi],dl

Now if we re-disassemble the target of this call at 00007ffa`b2f6c050

 0:000> u 00007ffa`b2f6c050
Foo.MyClass.b(Int32):
00007ffa`b2f6c050 e99b411100      jmp     One!Foo.MyClass.b(Int32) (00007ffa`b30801f0)
00007ffa`b2f6c055 5f              pop     rdi
00007ffa`b2f6c056 0201            add     al,byte ptr [rcx]

As you can see the address has been patched up and now b() is JITed and a() calls into b() without going through any stubs.

Obviously in this example I took a bunch of assumptions, but hopefully you now have the understanding to go debug your own scenarios and see what is at play. Some of the takeaways if you have JIT issues at startup

 

JIT happens at method granularity

 

Use modular code. Especially if you have error handling and other code which is rarely or almost never used, instead of having them in the main function, move them out. This will ensure that they are never JITed or at best not JITed at startup

 void Foo()
{
    try{
       //...
    }
    catch(Exception ex)
    {
         ComplexErrorHandling(ex);
    }
}

is better than

 void Foo()
{
    try{
       //...
    }
    catch(Exception ex)
    {
         LogLocal();
         UploadToSomeServer();
         // MoreCode;
         // EvenMoreCode;
    }
}

For a given function running it once, JITs the whole function. However, do note that if it has difference code branches and each calls other functions you will need to execute all branches. In the case below Foo has to be called with both true and false to ensure downstream methods are JITed

 void Foo(bool flag)
{
    if(flag)
        YesFlag();
    else
        NoFlag();
}

JITing happens in the same thread as the calls. The JIT engine does takes lock to ensure there is no races while JITing the same method from multiple threads.


Consider using NGEN, MultiCoreJIT or ForceJIT all methods you care about or even build your own mechanism based on the following code

Here’s some PseudoCode to force JIT which accomplishes that using RuntimeHelpers.PrepareMethod API (note this code does no error handling whatsoever). You can craft code around this to ForceJIT only assemblies and/or types in them that is causing JIT bottlenecks. Also this can be parallelized across cores. The .NET Multicore JIT is based on similar principle but automatically does it by generating a profile of what executes at your application startup and then JITing it for you in the next go.

 using System;
using System.Reflection;
namespace ConsoleApplication5
{
    class Program
    {
        static private void ForceJit(Assembly assembly)
        {
            var types = assembly.GetTypes();

            foreach (Type type in types)
            {
                var ctors = type.GetConstructors(BindingFlags.NonPublic
                                            | BindingFlags.Public
                                            | BindingFlags.Instance
                                            | BindingFlags.Static);

                foreach (var ctor in ctors)
                {
                    JitMethod(assembly, ctor);
                }

                var methods = type.GetMethods(BindingFlags.DeclaredOnly
                                       | BindingFlags.NonPublic
                                       | BindingFlags.Public
                                       | BindingFlags.Instance
                                       | BindingFlags.Static);
                
                foreach (var method in methods)
                {
                    JitMethod(assembly, method);
                }
            }
        }

        static private void JitMethod(Assembly assembly, MethodBase method)
        {
            if (method.IsAbstract || method.ContainsGenericParameters)
            {
                return;
            }
            
            System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod(method.MethodHandle); 
        }

        static void Main(string[] args)
        {
            ForceJit(Assembly.LoadFile(@"d:\Scratch\asm.dll"));
        }
    }
}

Comments

  • Anonymous
    September 29, 2014
    Excellent articel, really interesting that the stack is backtracked and patched when JIT:ing occurs.

  • Anonymous
    October 01, 2014
    "Use modular code. Especially if you have error handling and other code which is rarely or almost never used, instead of having them in the main function, move them out. This will ensure that they are never JITed or at best not JITed at startup" Why does it matter? Isn't the method JITed when it is first called? What good will it do if it is just hidden in a different method besides wasting CPU and memory for JITing the intermediate method as well?

  • Anonymous
    October 01, 2014
    RRR I think the whole post elaborate why you should do it.

  • Anonymous
    October 01, 2014
    Do you feel that the Roslyn compiler could help eliminate these concerns?  Or, do you feel that if you could use Roslyn that you would choose not to, because you wouldn't have as much "warm-up" control over these concerns?

  • Anonymous
    October 02, 2014
    @RRR "Especially if you have error handling and other code which is rarely or almost never used, instead of having them in the main function, move them out." What's the use of JITing code when it's hardly ever used? Hence, JIT stands for Just-In-Time.

  • Anonymous
    October 02, 2014
    Eric Roslyn is a compiler that produces MSIL. So that is irrelevant to JIT which converts IL to native. The other option is also .NET Native which directly produces native code. That like NGEN does not have JIT issues. However, .NET Native is yet to hit server/desktop scenarios

  • Anonymous
    October 02, 2014
    The comment has been removed

  • Anonymous
    October 02, 2014
    Or rather, call 00007ffa`b2f6c050 initially and then call 00007ffab30801f0 after JIT compiling?

  • Anonymous
    October 02, 2014
    The comment has been removed

  • Anonymous
    October 03, 2014
    Sorry Tony I updated the text. Basically the target location is the thunk 00007ffab2f6c050 Pre-JITing of b it is 0:000&gt; u 00007ffab2f6c050 Foo.MyClass.b(Int32): 00007ffab2f6c050 e87b5e755f &nbsp; &nbsp; &nbsp;call &nbsp; &nbsp;clr!PrecodeFixupThunk (00007ffb126c1ed0) 00007ffab2f6c055 5e &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;pop &nbsp; &nbsp; rsi 00007ffab2f6c056 0201            add     al,byte ptr [rcx] After JITing it is 0:000> u 00007ffab2f6c050 Foo.MyClass.b(Int32): 000007ffab2f6c050 99b411100      jmp     One!Foo.MyClass.b(Int32) (00007ffab30801f0) 00007ffab2f6c055 5f              pop     rdi 00007ffab2f6c056 0201 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;add &nbsp; &nbsp; al,byte ptr [rcx] So back-patching happened at &quot;00007ffab2f6c050"