Jit Optimizations: Inlining (II)

In a JIT compiler, inlining can become an expensive optimization (compile time wise): it can involve loading other classes or assemblies, doing security checks, etc... What's worse, even after doing all this expensive work, you may find out that the candidate for inlining wasn't really worth it, so you just have to throw away work you did, wasting not only time, but possibly affecting other things like working set (you loaded a class that you didn't really need).

Another reason the JIT has to be smart about what it inlines is due to how the JIT compiler works. To give you an idea, consider a function f() that will generate an optimal solution for a problem in O(N^2) steps and a function g() that solves the same problem, but not optimally, in O(N) steps. If you have a limited time to find the solution, a good approach could be doing the optimal solution for small Ns, and fallback to the non optimal, but fast, solver for larger Ns. In our case, the problem is generating good code, and N is a measure of the complexity of our input (code size, complexity of flowgraph, number of variables etc...). What does this have to do with inlining? Well, with inlining you are just making that N number bigger, which can result in us crossing the line (which in practice is not as well defined as in my example) that will make us generate less optimal code. This is a problem the VC team encountered with their IL generation (they do a lot of optimizations at an IL level, among others, inlining and they found out that very aggressive inlining was hurting the quality of the code our JIT generates)

From an engineering point of view, it makes sense to aproach this optimization with a 'Best Bang for your Buck' attitude, which means spending your compile time resources (time and space) and developer resources covering the most common cases where there is benefit inlining, but not a big risk of making things worse. A typical example of a really good candidate for inlining is a property getter/setter. These are usually really small methods that usually just do a memory fetch or store, so it's usually a size and speed win to inline them.

These are some of the reasons for which we won't inline a method:

- Method is marked as not inline with the CompilerServices.MethodImpl attribute.

- Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).

- Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.

- Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).

- MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario

- VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).

- Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...

- If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)

- Other: Exotic IL instructions, security checks that need a method frame, etc...

Comments

  • Anonymous
    October 31, 2004
    Thanks again for your insightful post!

    Quote: "We don't inline across virtual calls."

    I see the reason for this for general virtual calls, but does it also apply to 'virtual' calls in sealed classes? Or virtual calls that are 'sealed' in another way, e.g. sealing the method itself, if the highlevel language supports that (C# supports sealing a method). After all, because those methods are sealed, they can be resolved at compile/JIT time for 100% of the cases, so they are not truly 'virtual' anymore.

  • Anonymous
    October 31, 2004
    The comment has been removed

  • Anonymous
    October 31, 2004
    Totally agreed on the MarshalByRef for WinForms, I still haven't seen a good explanation as to why this decision was made. Hopefully David can fill us in here...

  • Anonymous
    November 01, 2004
    > There have been suggestions for us adding an attribute to control these thresholds.
    Why add something so specific why not provide a set of hints that compilers (or even people) that compiler can add to inform JIT about priorities. JIT will always have those limitations and compilers can spare some time.
    So you would be able apply (just a hint) a InlineMethod and maybe even mark regions of the code as seldom executed.
    Other hints or compiler relaxation might be to say that arrays do not need to support covariance.

    As always could you please also comment on how inlining is effected by NGEN and how inlining is effected by Generics. Please correct my assumption: if type parameter is constrained to IFoo with method f() and let's say the method satisfies the requirements you mention above. it seems to me that for T : struct inline will be done but not for a class.

    Dmitriy

  • Anonymous
    November 01, 2004
    Luc: You are right, we could be smarter for sealed classes, we currently aren't using the sealed attribut to get advantage here.

    Hallvard: In v1.0 and v1.1 we didn't inline any MBR call, so yes, that means that we wouldnt inline Windows.Forms code, we are better in Whidbey in this respect.

    As about why Windows.Forms is MBR, this is something I digged up from Brad Abram's Blog:

    "Our preference would be for it to not be Marshal By Ref, but because it is hosted via com cross domain in IE, we needed to make it mbro in order for the com calls to marshal to the com apartment"

    Dmitry:

    I dont remember offhand any different behavior between inlining in the JIT and NGEN (they use the same compiler). For generics, I would say it's the other way around, you have more possibilities of getting the inlining right if T is a class, and not a struct

  • Anonymous
    November 01, 2004
    David, could you explain.
    If T is a class you share the code for all such Ts how would you then inline?
    If T is a struct you are generating a different code for each T so inlining seems to be possible.

  • Anonymous
    November 01, 2004
    David,
    I thought v2 of ngen is doing more work than JIT.

  • Anonymous
    November 01, 2004
    The comment has been removed

  • Anonymous
    November 02, 2004
    David,
    I tried to estimate how many calls will actually get inlined. So I compiled the following code to see if g will get inlined:

    namespace Test
    {
    class C
    {
    public void g () {}
    public void f () { g (); }
    }
    class Class1
    {
    [STAThread]
    static void Main ( string [] args )
    {
    C c = new C ();
    c.f ();
    }
    }
    }

    - g is much smaller then 32 bytes,
    - the call is not virtual,
    - no value types are involved,
    - it's not marshalled by ref,
    - no loops, no complex flow,
    - the code is in the same assembly so no VM restrictions shall apply.

    Yet the call is not inlined. I understand that your list does not cover all possible cases. Just curious what is the reason in such seemingly trivial one?

  • Anonymous
    November 03, 2004
    Kirk, I suspect that you are looking at debug code (looking at the code under VS?, it always tells the jit not to optimize), this is what we generate with optimized code:

    mov ECX, 0x3c10f60
    call CORINFO_HELP_NEWSFAST
    ret

  • Anonymous
    November 03, 2004
    David, that was indeed the case. Using cordbg with JitOptimizations set to 1 helps. BTW in my case cordbg shows just a number instead of CORINFO_HELP_NEWSFAST. I could not figure out how to fix that.

  • Anonymous
    November 04, 2004
    Another comment: it appears that that you guys inline virtual call after all. You don't inline calls to virtual methods. But a callvirt to non-virtual method (rather common in C# case) may get inlined.

  • Anonymous
    November 05, 2004
    Interesting Findings today

  • Anonymous
    November 27, 2004
    When will the inlining of structs issue be resolved? This is very important for many situations such as the various structs in the DirectX namespaces (Vectors!), System.Drawing etc. and for scientific computing.

    I have filed a suggestion here:
    <http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=fb7b3c93-a9e9-418b-85b3-b67a195c7e1a>

    and as you can see I am not the only one worrying about this.

  • Anonymous
    February 27, 2007
    The .NET Just-In-Time Compiler (JIT) is considered by many to be one of the primary performance advantages

  • Anonymous
    August 31, 2007
    A menudo al hablar de optimización de código se habla del inlining ( código en línea ) como técnica de

  • Anonymous
    October 25, 2007
    Just a reminder: Release builds are not Debug builds. Seems obvious, but it's worth saying again. Release

  • Anonymous
    October 26, 2007
    PingBack from http://www.hanselman.com/blog/ReleaseISNOTDebug64bitOptimizationsAndCMethodInliningInReleaseBuildCallStacks.aspx

  • Anonymous
    March 14, 2008
    After going this week to the Microsoft performance open house , here are few things to consider: Create

  • Anonymous
    April 19, 2009
    Open Source Is The Root Of All Evil

  • Anonymous
    April 19, 2009
    PingBack from http://www.agenericcollection.com/post/Inlining-in-C.aspx

  • Anonymous
    June 16, 2009
    PingBack from http://workfromhomecareer.info/story.php?id=12051