Jaa


Unmanaged Memory Fragmentation -- an old story

Before I worked on the CLR I spent several years in MSN. One of the big projects I worked on was "Sidewalk" which among other things offered a yellow pages service. A lot of this work was done under IIS with ISAPI extensions and you can imagine the fun involved in writing a lot of native code that needs to deliver great sustained performance. It was very very difficult. In my opinion one of the best features of ASP.NET is that performance doesn't tend to decay over time. I bet a lot this would never have happened if we had been running on say Windows Server 2003 as the system heap is much better but it's still an interesting story.

In Sidewalk 3.0 [Autumn 1998] we had a problem with sustaining our performance over long runs. After 10-12 hours of heavy stress we were much slower than our initial speed and we needed to go at least 72 hours, preferably much more. Heap size was growing over time, and at first we thought we were fighting a leak, but a lot of analysis revealed that the speed degrade was due to heap fragmentation -- actual allocated bytes were not growing very quickly at all. There were leaks but they were comparatively tiny -- we could have run for months at a time with leaks of that magnitude.

Luckily, we found that the fragmentation was on private heaps associated with one of our components. This component (an OLEDB provider) was free-threaded and as a result we had already done some interesting things to the heap to help performance (we were running on Windows NT4 sp3 I believe). The way things sat, was that we had one logical heap made up of 16 private heaps we could use for allocations. We would hash the incoming thread ID and then allocate from the appropriate private heap. As a result of this method, contention on the heaps was virtually nil (we usually ran with < 16 threads). Since the thread releasing the memory might not be the same as the thread allocating the memory, we had to save the heap handle secretly in every allocation (4 byte overhead) but since the allocations were pretty big that wasn't a problem. So we were in good shape contention wise and given the pointer to memory, we could tell what heap to free it out of.

To address the fragmentation problem, we changed this scheme just a little bit:

Instead of having 16 heaps, we doubled it. 32 heaps -- 2 sets of 16. The first 100,000 allocations happen from the first set of heaps, as usual. On the 100,001st allocation we cut over to the second set of heaps. These heaps are "pristine". Now there is a memory transition happening... allocations are happening on the 2nd set of heaps but frees are happening on the 1st set of heaps (remember memory is always freed where it was allocated by necessity). As a consequence, the older heaps are being "tidied up". By the time we reach allocation number 200,001 the original heaps are nice and tidy, virtually pristine in fact as very few objects have a lifetime long enough that they haven't already been freed. We can repeat this process indefinately as on the 300,001st allocation, just as the 1st set is getting bad again, we switch once more to the 2nd set and so on.

Using this scheme memory fragmentation was eliminated and our top speed became our sustained speed (i.e. virtually no overhead associated with the dual heap system). Our working set actually plunged compared to the original fragmented heap mechanism, and you can see that the actual physical memory required of the dual heap solution is not that much more than an ideal single heap... at any given moment really only one "heap's worth" of memory is in use in a working set sense.

I selected the number 100,000 based on our number of allocations/sec and the measured rate of slowdown.

It took me over a week to diagnose just this one bug, although it took me only about 10 minutes to fix it once I knew what was going on. Aren't you glad our latest offerings do this better?

Comments

  • Anonymous
    February 02, 2006
    PingBack from http://blog.stevex.net/index.php/2006/02/02/heap-fragmentation/

  • Anonymous
    February 02, 2006
    > It took me over a week to diagnose just this
    > one bug, although it took me only about 10
    > minutes to fix it once I knew what was going
    > on. Aren't you glad you don't have to deal
    > with that sort of mess?

    Um, I thought everyone had to deal with that kind of mess, not usually but often enough that it would happen to everyone.

    On the other hand, the solutions that you describe here sound like more than 10 minutes of work.

  • Anonymous
    February 02, 2006
    why would you think it would take that much more work? maybe figuring out, but with a good design, such a change would only have to be made in one place..


    I've never heard of this idea, but I must say, I like it! Did you get the idea by yourself? Or is it form a article/patern?

  • Anonymous
    February 02, 2006
    "Aren't you glad you don't have to deal with that sort of mess?"

    But I too have to deal with this sort of mess all the time, and I generally have to do it without access to the source code, developers or secret knowledge about a vast propertion of the stuff I'm dealing with. Incidentally, that's a fact which I would beat into Raymond Chen with a baseball bat if I ever had a quiet five minutes with him...

    But I must say, I do like the simple ping-ponging strategy for heap tidying.

    To the person who thought it would be hard, Rico's mod probably looked like a bit like this:

    if((++nAllocs % 100000) == 0)
    {
    heapIndexOffset ^= 0x10;
    }

    Actually, Rico's so old, it probably looked like this:

    if((++nAllocs % 100000) == 0) {
    heapIndexOffset ^= 0x10;
    }

    ;-)

    There. There's only Larry Osterman to go and I shall have upset all the old folks...

  • Anonymous
    February 02, 2006
    "It took me over a week to diagnose just this one bug, although it took me only about 10 minutes to fix it once I knew what was going on. Aren't you glad you don't have to deal with that sort of mess?"

    I think we all have these problems, Rico. Don't take this personally, because I don't mean this in any way disparagingly, it's just an observation: MS folks IMHO seem to think that what they do is more "heavyweight" than what anyone else does anywhere. In my opinion the biggest difference is the number of customers the rest of us have, not the depth or difficulty of the problems solved.

  • Anonymous
    February 03, 2006
    >> Aren't you glad you don't have to deal with that sort of mess?

    >> In my opinion the biggest difference is the number of customers the rest of us have, not the depth or difficulty of the problems solved.

    Sorry about that, that was actually an attempt at humor (because it's old technology). I was thinking most of my readers write managed code or else are using more modern operating systems that don't have that problem anymore. I didn't mean to make any implication about the depth of my customers' problems...

    That was very clumsy of me.

  • Anonymous
    February 03, 2006
    I changed the comment to be closer to what I intended. "Aren't you glad our latest offerings do this better?"

    I apologize that my attempt at a self-slam could be taken the other way. That was not intended.

  • Anonymous
    February 03, 2006
    What about managed heap fragmentation? Will you be able to write about this?

  • Anonymous
    February 05, 2006
    > What about managed heap fragmentation? Will you be able to write about this?

    The managed heap only gets fragmented when a pinned pointer refers to a block of memory in the middle of it and a garbage collection subsequently occurs. This fragmentation used to be pretty bad in .NET 1.0 and 1.1, but it's much better now. This one of the reasons that one should never pin an managed object for very long.

    Heap fragmentation is almost always not a problem because the .NET GC is a compacting collecter. In moving objects into appropriate generations, the free space in the gaps is eliminated. Having said that, to my knowledge it is possible, but unlikely, for the large object heap (objects > 86000 bytes) to suffer from minor fragmentation problems for certain odd patterns of allocation (but the allocations would have to keep on going, i.e. a lot of memory would still have to be explicitly allocated). There was a presentation on the GC heap at the PDC; the videos and presentations are available somewhere on microsoft.com.

    Rico has written on this:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp

  • Anonymous
    April 25, 2008
    PingBack from http://www.spiteful.com/2008/04/25/dev-diligence-dont-invest-in-the-wrong-code/

  • Anonymous
    February 01, 2009
    This post is Part 4 in the series of posts on Garbage Collection (GC). Please see the index here. Recap