Jaa


Memory management in the .NET Framework

This is a subject that has been covered before and I have no intention of writing the ultimate post on the subject. Still I think this is something that every good developer should know.

Why do I need to know this?

My colleagues and I are quite often asked about the necessity of knowing how the Garbage Collector (GC) works. After all, it isn't really necessary to know what's going on behind the scenes, right..? A developer shouldn't have to bother about how the framework works..? Or should he..?

Well you may be a pretty good bus driver even if you know nothing about engines. Basic stuff like not stepping on the break and the gas at the same time is probably enough. If you want to do F1 racing, on the other hand, you will want to know as much as you can about your car. So, if you're content doing small scale web applications or not so efficient winforms applications, then you probably don't have to bother. If you want to do front-line work, then you've got to get your hands dirty.

 

The different generations

In order to be effective the managed heap is trying to figure out which objects it can disregard and only check up on occasionally. Some objects may span the entire lifecycle of the application and others may be very short lived, so if an object stays alive for a longer period of time the GC will check up on it less frequently. This is accomplished by dividing the managed heap into three generations. Generations 0, 1 and 2. An object will begin in generation 0 and if it is in use for a long period of time it will travel through generations 0 to 2. A possible metaphor for this scenario would the following:

  • You're asked to call a customer. You're pretty good with numbers so you keep the phone number in your head. (Generation 0)
  • There is no reply and so you need to remember the phone number a little longer. Meanwhile you're given additional numbers to call.
  • You know from past experience that your limit for keeping numbers in your head is 10, so once you reach that critical point you decide to write down the numbers you still need on post-it notes (Generation 1)
  • As the day goes on you keep transferring numbers from memory to post-its.
  • Your desktop has now reached maximum capacity. It is cluttered with post-it notes, so you sort through them, determining which notes you still need. You throw away the ones that are obsolete and write down the remaining to your address book. (Generation 2)

Like I stated above; when an object is created it is put in Generation 0. The initial size limit of Generation 0 is determined by the size of the processor cache. This is dynamically changed depending on the allocation rate of the application. Once Generation 0 reaches its limit it will go through all the items in Generation 0, tag the obsolete objects for collection and remove them. This is called the Mark and Sweep phase. Everything that survives this sweep will be compacted and moved to Generation 1. The size limit of Generation 1 is also determined by the allocation rate of the application, and once it is reached a Generation 1 collection will occur. This will first mark and sweep all items in Generation 1, moving the surviving items to Generation 2, and then mark and sweep the items in Generation 0.

The healthy ratio between GC's in the different generations is approximately 100 - 10 - 1, so for 100 Generation 0 GC's you normally have 10 GC's of Generation 1 and 1 of Generation 2. A normal Generation 0 GC usually takes a few milliseconds. Performing a GC of Generation 1 rarely takes more than 30 milliseconds, but a Generation 2 collection can take quite some time depending on the application.

 

The Large Object Heap

To top it off we also have what is called "The Large Object Heap" (LOH). This is where anything larger than 85.000 bytes will be stored. The LOH is collected each time a new segment needs to be reserved (see below), but it is not compacted like generations 0, 1 and 2. When you perform a collection on the LOH you will also perform a GC on the other generations.

Now, if the limit is at 85.000 bytes, won't most of my objects end up on the LOH? - Well not necessarily. For example: A dataset may contain lots and lots of data, but the dataset object only contains references to other objects. The data in the columns are each stored in individual strings and as long as they're not larger than 85 KB you're clear.

 

Memory segments

The managed heap will reserve memory in segments. The size of the segments depend on the configuration. If <gcServer enabled="true"/> you will reserve memory in 64 MB segments, otherwise you'll be doing it in 32 MB segments. LOH are reserved in 16 MB segments. Only Generation 2 and the LOH will span several segments.

 

What happens during GC?

Let's say we're performing a full GC, including the LOH. This is what happens:

  • The objects on the LOH are Marked. Each item in the LOH is checked for references. If none are found it's ready to be collected.
  • The LOH is Swept. All marked objects are released from memory.
  • The LOH is not compacted.
  • Generation 2 is Marked.
  • Generation 2 is Swept.
  • Generation 2 is compacted. (Imagine removing a few books from your bookcase, and then pushing the remaining books together freeing up continuous space at the end.)
  • Generation 1 is Marked.
  • Generation 1 is Swept.
  • Everything  that survived the sweep is compacted.
  • The pointer for where Generation 2 ends is updated. Everything that survived the sweep is now Generation 2.
  • Generation 0 is Marked.
  • Generation 0 is Swept.
  • Everything that survived the sweep is compacted.
  • The pointer for where Generation 1 ends is updated. Everything that survived the sweep is now Generation 1.

 

A few quick tips

This topic could off course be a lot bigger, but here are some quick suggestions.

Try to stay out of Generation 1

Well off course Generation 1 is better than Generation 2, but you should aim for keeping only a select few objects in Generation 2. Those should be variables that are defined at the beginning of the application lifecycle and released at the end. In an ideal world all other variables should be of the hit-and-run variety and never leave Generation 0.

Don't call GC.Collect()

This has been said before. It will be said again, and again, and again. You should almost never call GC.Collect() manually. And by almost I mean once in a lifetime, not once per application, and certainly not once per function call. I occasionally call GC.Collect() for testing purposes just to see if memory has been released. Normally you would never call it. The GC is self-balancing and by calling GC.Collect() you are disrupting that balance. Think of it as tampering with the eco-system, pouring sugar in the gasoline or whatever metaphor you prefer. :-)

Avoid large objects

If you can stay below the 85.000 byte limit, then do so. If not, consider reusing the object. When it comes to large objects it's better to use one for a long time than to use many for short periods of time.

Don't use finalizers

When your object has a finalizer the finalize method will be called when the object is no longer alive. So far so good. Unfortunately your object will be passed into the next generation, since it's not yet ready to be collected. This means that all objects with a finalizer will at least end up in Generation 1. Most likely in Generation 2.

Well, as I said: There is a lot more to cover on this. There are books to be written and songs to be sung, but you've got to draw the line somewhere. I will most certainly cover more of this in future posts.

/ Johan

Comments

  • Anonymous
    April 20, 2007
    PingBack from http://bigtunatim.wordpress.com/2007/04/20/garbage-collection-in-the-net-framework/

  • Anonymous
    April 22, 2007
    I updated the post. I re-read it and thought that the "What happens during GC?" section could be a bit more elaborate. Mea culpa / Johan

  • Anonymous
    April 28, 2007
    Hi, You state to avoid 85k or bigger objects, but how can I measure the size of my object?

  • Anonymous
    May 02, 2007
    The immediate answer to this question is to do what I describe in http://blogs.msdn.com/johan/archive/2007/01/11/i-am-getting-outofmemoryexceptions-how-can-i-troubleshoot-this.aspx This might be a bit more low-level approach than you expected, but it really is a good way to look at what you're populating the stack with. However; as I tried to describe in my post you only need to pay attention to the individual elements. If your object conatins nine 10 KB strings, then the strings are stored as individual elements and the object itself only contains references to the strings. So nothing would end up on the large object heap. If you had an object with one 90 KB string, though, the string would be on the large object heap, but the object itself would not. / Johan

  • Anonymous
    June 03, 2007
    In your last point you say "Don't use finalizers"... Am I correct in assuming that doesn't include classes that are written using the IDisposable pattern where the code uses deterministic finalization?

  • Anonymous
    June 03, 2007
    Hi Jeremy, You're absolutely correct in your assumption. Deterministic finalization is a whole other ballgame. The reason why you wouldn't use (and more importantly; rely 100% on) finalizers is because they will get invoked by the GC, and thereby be promoted to the next generation. That doesn't apply if you use the IDisposeable pattern correctly. There is a potential risk in the fact that people tend to ignore calling .Dispose, but if implemented and executed correctly it is great.

  • Anonymous
    June 13, 2007
    Last update: June 13 , 2007 Document version 0.6 Preface If you have something to add, or want to take

  • Anonymous
    June 24, 2007
    Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net

  • Anonymous
    March 26, 2008
    My colleague Tess showed me a dump today which I thought was really interesting. The scenario was as

  • Anonymous
    March 26, 2008
    Have you read http://support.microsoft.com/?kbid=307340 ? If not, I suggest you do so. If you need convincing

  • Anonymous
    March 26, 2008
    What do finalizers and weak references have in common? Well more than you might think actually. Finalizers

  • Anonymous
    January 10, 2009
    Note : This entry was originally posted on 9/14/2008 5:16:11 PM. I present at a lot of the local Florida