Performance Quiz #5: The performance cost of the garbage collector
It's been a while, so it's time for another exciting Performance Quiz!
Here are two implementations of a management algorithm that creates and destroys trees. The first implementation uses the garbage collector to do all of the memory management (even freeing is done via periodic collections, we just drop the nodes on the floor). The second implementation is more like what you would write if you had to do a malloc/free style solution that managed your own memory. However after the first iteration there will be no collections at all -- allocations come out a high-speed free list and the freelist is created via a fast recursive algorithm.
The Question: How much faster is Test2() than Test1()?
Warning: There's already spoilers in the feedback
------------------------------------------
using System;
namespace Test
{
public class Quiz5
{
public static void Main()
{
// try running the tests in the other order too, or just one per run
Test1();
Test2();
}
public static void Test1()
{
Random r = new Random();
Tree1.Reset();
for (int i = 0; i < 50; i++)
{
for (int j = 0; j < 100000; j++)
{
int k = (int)r.Next();
Tree1.Insert(k);
}
Tree1.Reset();
}
}
public static void Test2()
{
Random r = new Random();
Tree2.Reset();
for (int i = 0; i < 50; i++)
{
for (int j = 0; j < 100000; j++)
{
int k = (int)r.Next();
Tree2.Insert(k);
}
Tree2.Reset();
}
}
class Tree1
{
private int key;
private Tree1 left;
private Tree1 right;
private static Tree1 root;
private Tree1(int _key)
{
key = _key;
}
static public void Reset()
{
root = null; // much garbage is created
}
static public void Insert(int key)
{
Tree1 t = root;
if (t == null)
{
root = new Tree1(key); // asking for new memory
return;
}
for (;;)
{
if (t.key == key)
return;
if (key < t.key)
{
if (t.left == null)
{
t.left = new Tree1(key); // new memory again
return;
}
else
{
t = t.left;
}
}
else
{
if (t.right == null)
{
t.right = new Tree1(key); // new memory yet again
return;
}
else
{
t = t.right;
}
}
}
}
}
class Tree2
{
private int key;
private Tree2 left;
private Tree2 right;
private static Tree2 root;
private static Tree2 freelist;
static public Tree2 NewTree2(int _key)
{
Tree2 t = freelist;
if (t != null)
{
freelist = t.right;
t.right = null;
}
else
{
t = new Tree2(); // on the 2nd time through this never happens
}
t.key = _key;
return t;
}
static public void Reset()
{
FreeTree(root);
root = null;
}
static private void FreeTree(Tree2 t)
{
// put the tree on the free list... no collections
if (t == null)
return;
FreeTree(t.right);
FreeTree(t.left);
t.right = freelist;
t.left = null;
freelist = t;
}
static public void Insert(int key)
{
Tree2 t = root;
if (t == null)
{
root = NewTree2(key);
return;
}
for (;;)
{
if (t.key == key)
return;
if (key < t.key)
{
if (t.left == null)
{
t.left = NewTree2(key);
return;
}
else
{
t = t.left;
}
}
else
{
if (t.right == null)
{
t.right = NewTree2(key);
return;
}
else
{
t = t.right;
}
}
}
}
}
}
}
Comments
- Anonymous
January 05, 2005
The Test2 method will run double time than Test1. For Tree2 produce garbage one node by one node which spends bundle of time; Tree1 is very happy to kick the whole tree to GC which does the best to reclaim the memory.
At my test, the Test2 method uses 6.48 seconds and the Test 1 only 3 seconds.
Very impressive quiz! Thanks! :0) - Anonymous
January 05, 2005
> How much faster is Test2() than Test1()?
It isn't faster, Test1 is faster since allocation is so fast on the compacted heap.
Test2 executes more code while it is running, so it is slower.
But I don't think that this is a proper analogy for garbage collection to always be more performant than unmanaged malloc/free type mechanisms.
First of all, when you talk about the GC version's performance, are you asking about overall system performance, or just the immediate function call?
It seems like deferring some work to the garbage collector will eventually cause the garbage collector to do more work at some point in the future, while a malloc/free type implementation pays for it right now.
But just because it defers it doesn't mean that the deferred work doesn't exist at all...
Also, it's certainly possible to use a pool-based malloc/free type mechanism, where you just free the pool once instead of for each object. An approach like that would probably change the performance characteristics of your test pretty considerably. - Anonymous
January 05, 2005
If you use an array of structs instead of classes, and int indexes instead of pointers for the reference, the pool might actually serve some purpose (save memory and possibly improve locality of reference). As is it has no advantage. - Anonymous
January 05, 2005
With the exact code you posted, I got about the same results but I assume that's because test1 didn't actually do any garbage collects. So I modified it a bit and added a member to each class:
private byte[] val = new byte[1000];
Which makes each node 1000 bytes big. In that case I get 22s for method 2 and 3 minutes 20 seconds for method 1.
At the same time, however, you probably shouldn't just cache every node that's ever created. What happens if the first request requires 1 million nodes, but then you only ever need at most 10 at a time after that? Of course, in a server-based application you're probably getting a fixed number of requests per second and you can probably assume that it's OK to just naively cache everything indefinately... - Anonymous
January 05, 2005
The comment has been removed - Anonymous
January 05, 2005
The comment has been removed - Anonymous
January 05, 2005
First of all, I didn't play around with the code. But the phrasing of your question 'How much faster is Test2() than Test1()?' already set off alarm bells of the kind 'is that a trick question'?
But one thought that came up in my mind, and that hasn't been mention above, is about what would happen when some 'other' part of the application did induce garbage collects simultaneously with this code. Isn't it so that in that sooner or later you may see differences between the trees' behaviours related to which GC generation the nodes are placed? Obviously, Tree2 has a larger chance of getting its data in generation2, because it doesn't free anything. But then again, the same may be true for Tree1, if it doesn't free any memory during the build-up. Any thoughts about how GC generations affect this scenario? - Anonymous
January 05, 2005
What? Me ask a trick question? What makes you think I would ever do such a thing? evil grin :) :) - Anonymous
January 06, 2005
Don't >both< versions garbage collect, with the GC in the second one just wasting cycles? - Anonymous
January 06, 2005
The GC collects in response to memory pressure -- i.e. allocations. In the absence of allocations there are no collections. The second one does not collect. - Anonymous
January 06, 2005
The comment has been removed - Anonymous
January 06, 2005
I'd be foolish indeed to try to prove anything so profound as that with such a trivial benchmark. It's merely an interesting problem with surprising and educational properties. - Anonymous
January 06, 2005
The comment has been removed - Anonymous
January 06, 2005
The perf counter '% time in GC' had a somewhat surprising result on this code. Test1 had a normal erratic graph with a single digit average. But Test2 held a constant 30%. Is that counter always accurate, or is is possible the value got 'stuck'?
The result makes more sense if Joe is correct that assigning a GC pointer is not as simple as copying a number to a memory location. - Anonymous
January 07, 2005
It's more than just the cost of modifying the references. When GC goes to check which objects are alive it must trace through all of the modified references. Test2 has a lot more "live" modified references, and so much more of the heap needs to be traced. Modiying references in higher generations can result in a lot of the heap having to be traced. - Anonymous
January 07, 2005
>>The perf counter '% time in GC' had a somewhat surprising result on this code. Test1 had a normal erratic graph with a single digit average. But Test2 held a constant 30%. Is that counter always accurate, or is is possible the value got 'stuck'?
Those performance counters are updated at every collection (it would be too expensive to update them in real time). So if there are no collections then of course it shows the % as of the last collection and stays there. Test2 has no collections so it stays "stuck" as you say.
Maoni wrote an excellent treatment of this here: http://blogs.msdn.com/maoni/archive/2004/06/03.aspx
And btw her blog generally has great GC advice.
>>It's more than just the cost of modifying the references. When GC goes to check which objects are alive it must trace through all of the modified references.
In this particular benchmark Test2 has no collections at all. Therefore there is no cost associated with tracing. If Test2 were being used side-by-side with other code it would incur that cost so you would likely not want to keep such a large freelist around. I talk about this some in the solution (link above).
To use Test2 in a real system you'd want to be a lot more clever about your recycling policy. - Anonymous
January 07, 2005
I emailed you the pooled struct code. You did not provide a correctness verification function and I did not write one, so it is unverified. But it does compile and run.
Note that the memory consumption could be cut by 33% by using 2 shorts instead of int for right and left references; the maximum capacity would then be limited to the max size of a short. - Anonymous
April 03, 2005
Performance Quiz:Memory management