Partilhar via


Mid-life crisis

This particular problem (I call it mid-life-crisis) seems to come up fairly often so I thought I'd write up some general advice on it.  The symptoms go something like this:  There is a server process (usually a web server) and that process has a high percentage time spend in the garbage collector, like say 30%, or even more.  Simple enough, but why would the time be so high?  Why isn’t it more like the 1% or so that we’d like it to be?

Often (not always) the answer is mid-life-crisis.  By this I mean that something has happened that is causing objects that are normally of middle lifetime length (which we would very much like to die in generation 1) are living longer and end up dying in generation 2.

This is very bad.

Generation 2 garbage collections are the full ones.  That means every object on the heap must be visited and the process is largely stopped while this is going on.  If you are getting a lot of objects promoted all the way from generation 0 to generation 2 and then having them die shortly thereafter you are paying a huge price to clean up those objects. 

Why does this happen?

Well in the server case there’s one very common reason.  Let’s say it’s a web server, a request comes in, a bunch of setup work is done to get the result for that request, and at some point the code then accesses a database or a web-service to get the necessary data.  At that point the thread accessing the data is blocked, but all the objects that were pending for that request are still live.  Meanwhile, other threads on the server are still running, still doing allocations, and those might end up requiring a garbage collection.  When that collection happens, all the temporary objects on the blocked threads are still live, probably in local variables or objects that represent the transaction in flight.  They survive the collection and are promoted. 

Now since transactions are often longish things and collections are going to happen at some point, it’s normal for some objects associated with the transaction in flight to survive the generation 0 collections that are hopefully happening every second or so.  Those objects are going to get promoted to generation 1 just like they should, in fact, the main purpose of the generation 1 group of objects is to live long enough for a transaction related objects to stick around and then die cheaply.

But here’s where things go wrong.  If there are fairly long delays waiting for say database results, and a fairly large number of objects representing the state of the transaction in flight, there will be enough buildup in generation 1 that it will become appropriate to try to collect those objects.  At that point the survivors will be promoted to generation 2.  If there are a lot of survivors we are now in trouble because in order to clean them up we will have to do a full collection.  If those are happening regularly, the percent time spent in the collector will shoot up from a healthy 1% to something very bad, like 30%, 50%, even more sometimes.

So what to do about this?

Well, the good news is there’s a fairly straightforward line of defense.  The trick is, that you must clean up (i.e. set references to null) as much of your state as possible before you block on something like a database, or really before you block on anything that might be long.  It’s often the case that a lot of the temporary data won’t be needed after the database results come back, or could be cheaply recreated.  Before you call your web-service or database backend, get rid of as much as you can so that the objects that will survive collections while you are blocked are minimized.  This will let more things die in generation 0, minimize additions to generation 1, and avoid the crisis your mid-life generation 1 objects will cause should they start surviving into generation 2.

Remember, the “age” of objects is a relative thing. Collections cause things to age, and allocations are what cause collections, so reducing the total number of allocations causes things to age more slowly.  Having your objects die as quickly as possible again reduces the pressure to grow the generations and hence keeps things younger.

To see if this sort of thing is happening to you, you can look at the Relocated Types view in CLR Profiler to see what’s getting moved around (remember things are normally moved when they are promoted so moving objects are a good proxy for promoted objects).  To get overall promotion rates, use the GC Performance counters, there are counters that will tell you how much stuff is getting promoted into generation 2.  You want that number to be as small as possible – zero is ideal and even achievable in steady state, but as long as the rate of generation 2 collects is staying low, you’ll be fine.

Summary:  Don’t have a mid-life-crisis.  When there’s are many threads be sure to release as many of your objects as possible before you block any thread.

Comments

  • Anonymous
    December 05, 2003
    The comment has been removed

  • Anonymous
    December 05, 2003
    You're absolutely right, if the JIT can statically determine that the variable is dead when the code is generated then there's no need null things out. However, what often happens in servers is that there are certain transaction state variables that are, for instance, on the "this" pointer which are still reachable. Those are the ones to null if you can.

    For instance, suppose you got your input in XML format and you had a series of functions to extract what you need to do the query out of the XML, building up a SQL query string as you go. Just before you make the SQL query, it would be good to release all the stuff related to the XML that you can so that it can be collected. Objects like that are often fields accessable via your "this" pointer rather than local variables, so they are reachable until the object holding them goes away.

    Other times there are helper collection classes that assist in the parsing and validation of the inputs. These objects also need to survive across several function calls (they are often accumulating results as process goes along), again these would seem live to the collector but perhaps they can be nulled or emptied.

  • Anonymous
    December 05, 2003
    Let me alter your example just a tad, this is a stupid example using intermediate strings just to illustrate

    class MyObj
    {
    private String s1;
    private String s2;
    private String s3;

    public void DoSomethingFromInputs(String s)
    {
    ...
    s1 = AnElegantOperation(s);
    ...
    }

    public void ComputeInterimResults(String options)
    {
    ...
    s2 = SomethingEvenMoreElegant(s1, options);
    ...
    }

    public void ComputerFinalQuery(String database)
    {
    ...
    s3 = SQLFormatting(s2, database);
    ...
    }

    public String GetResults()
    {
    ...
    s1 = null; // this is what I'm talking about
    s2 = null; // this is what I'm talking about
    // this blocks a long time
    String r = GetDataFromDatabase(s3)
    ...
    return r;
    }
    }

    MyObj o = new MyObj();
    o.DoSomethingFromInputs(s);
    o.ComputeInterimResults(options);
    o.ComputeFinalQuery(databasename);
    return o.GetResults();

    (please forgive my syntax, I hope you can get the jist)

  • Anonymous
    December 05, 2003
    Rico, thanks for clearing that up - I also shared Ken's concerns when reading your original post.

  • Anonymous
    April 15, 2005
    The comment has been removed

  • Anonymous
    August 24, 2006
    Game development is one of those dark arts where the usual laws of scalability don't always apply. ...

  • Anonymous
    December 14, 2006
    PingBack from http://compulsivecoder.com/caffeine/?p=15

  • Anonymous
    April 15, 2007
    PingBack from http://sanal.org/?p=309

  • Anonymous
    April 16, 2007
    It was really exciting to see that so many people answered the .NET GC PopQuiz , especially seeing that

  • Anonymous
    June 21, 2007
    Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net

  • Anonymous
    June 21, 2007
    Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net

  • Anonymous
    June 21, 2007
    Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net

  • Anonymous
    March 14, 2008
    After going this week to the Microsoft performance open house , here are few things to consider: Create

  • Anonymous
    June 11, 2008
    The comment has been removed

  • Anonymous
    December 22, 2008
    PingBack from http://www.taheta.org/?p=94

  • Anonymous
    December 25, 2008
    PingBack from http://www.jameskovacs.com/blog/DebugLeakyAppsIdentifyAndPreventMemoryLeaksInManagedCode.aspx