Concurrency, part 6 - Reference counting is hard :)

The other day, I talked about using reference counting as a way of ensuring object when enumerating a list containing objects.
Reference counting is an incredibly useful, but incredibly dangerous technique.  If you use it right, it can be a huge help, if you get it wrong, it's a nightmare.
For example, what's wrong with this code:
class Thing1 : public RefCountedObject<br>{<br>    CriticalSection _Thing1Lock;<br>    class Thing2 *_Thing2;<br>public:<br>    void Lock() { EnterCriticalSection(&_Thing1Lock); }<br>    void Unlock() { LeaveCriticalSection(&_Thing1Lock); }<br>    void SetThing2(Thing2 thing2)<br>    {<br>        Lock();<br>        _Thing2 = thing2;<br>        _Thing2->AddRef();<br>        Unlock();<br>    }<br>};class Thing2 : public RefCountedObject<br>{<br>    CriticalSection _Thing2Lock;<br>    class Thing1 *_Thing1;<br>public:<br>    void Lock() { EnterCriticalSection(&_Thing2Lock); }<br>    void Unlock() { LeaveCriticalSection(&_Thing2Lock); }<br>    void SetThing1(Thing1 *thing1)<br>    {<br>        Lock();<br>        _Thing1 = thing1;<br>        _Thing1->AddRef();<br>        Unlock();<br>    }<br>};
If you call thing2->SetThing1 and thing1->SetThing2, you've just introduced a circular dependency - neither thing1 or thing2 will ever be released.  And, while this example is utterly silly, there are lots of examples where you can get circular references. 
I think a real world example fits in quite nicely here.  We've got a notification class for some of our stuff that had a deadlock in it.  The original class looked like:
class NotificationManager: public IUnknown<br>{<br>    struct NotificationBlock <br>    {<br>        CALLBACKFUNCTION _Function;<br>        LPVOID _Context;<br>        NotificationBlock(CALLBACKFUNCTION Function, LPVOID Context) :<br>            _Function(Function),<br>            _Context(Context);<br>        {<br>        }<br>    }<br>    CAtlList<NotificationBlock> _NotificationListList;<br>    CRITICAL_SECTION _NotificationListLock;<br>    RegisterForNotifications(CALLBACKFUNCTION Function, LPVOID Context)<br>    {<br>        NotificationBlock block;<br>        block._Function = Function; block._Context = Context;<br>        <br>        EnterCriticalSection(&_NotificationListLock);<br>        _NotificationList.Add(block);<br>        LeaveCriticalSection(&_NotificationListLock);<br>    }<br>    UnregisterNotifications(CALLBACKFUNCTION Function, LPVOID Context)<br>    {<br>        NotificationBlock block(Function, Context);<br>        EnterCriticalSection(&_NotificationListLock);<br>        pos = _NotificationList.Find(block);<br>        _NotificationList.RemoveAt(pos);<br>        LeaveCriticalSection(&_NotificationListLock);<br>    }<br>    GenerateNotification(LPVOID NotificationBlock)<br>    {<br>        NotificationBlock block;<br>        EnterCriticalSection(&_NotificiationListLock)<br>        pos = _NotificationList.GetHeadList(block);<br>        while (block = _NotificationList->GetNext(pos))<br>        {<br>            block._Function(block._Context, NotificationBlock);<br>        }<br>        LeaveCriticalSection(&_NotificationListLock);<br>    }<br>}
Unfortunately, because the callback function was called with the list locked, there was a huge potential for deadlock (and in fact, one was observed).
My first thought was to change the notification block using a temporary list..
    GenerateNotification(LPVOID NotificationBlock)<br>    {<br>        NotificationBlock block;<br>        CAtlList<NotificationBlock> temporaryList;<br>        EnterCriticalSection(&_NotificiationListLock)<br>        pos = _NotificationList.GetHeadList(block);<br>        while (block = _NotificationList->GetNext(pos))<br>        {<br>            NotificationBlock tempBlock(block._Function, block._Context);<br>            temporaryList.Add(tempBlock);<br>        }<br>        LeaveCriticalSection(&_NotificationListLock);<br>        while (block = temporaryList.RemoveHead())<br>        {<br>            block._Function(block._Context, NotificationBlock);<br>        }        <br>    }
The problem with this is that it's possible for someone to have removed the block from the list as soon as the lock was released.  So I needed to have a way of protecting the blocks.  Well, the fourth principle comes into play: I reference counted the blocks.
Now the function looks like:class Thing1 : public INotification<br>{<br>    CriticalSection _Thing1Lock;<br>    Thing1Container *_MyContainer;<br>public:<br>    void Lock() { EnterCriticalSection(&_Thing1Lock); }<br>    void Unlock() { LeaveCriticalSection(&_Thing1Lock); }<br>    void FinalConstruct(NotificationContainer *MyContainer)<br>    {<br>        _MyContainer = MyContainer;<br>        _MyContainer->AddRef();<br>        _MyContainer->RegisterForNotifications(CallbackFunction, this); <br>    }<br>    void FinalRelease()<br>    {<br>        _MyContainer->UnregisterNotifications(CallbackFunction, this); <br>        _MyContainer->Release();<br>    }<br>};class NotificationManager: public IUnknown<br>{<br>    CAtlList<INotification *> _NotificationListList;<br>    CRITICAL_SECTION _NotificationListLock;<br>    RegisterForNotifications(INotification* Notify)<br>    {<br>        EnterCriticalSection(&_NotificationListLock);<br>        _NotificationList.Add(Notify);<br>        LeaveCriticalSection(&_NotificationListLock);<br>    }<br>    UnregisterNotifications(INotification *Notify)<br>    {<br>        EnterCriticalSection(&_NotificationListLock);<br>        pos = _NotificationList.Find(Notify);<br>        _NotificationList.RemoveAt(pos);<br>        LeaveCriticalSection(&_NotificationListLock);<br>    }<br>    GenerateNotification(LPVOID NotificationBlock)<br>    {<br>        INotification *notify;<br>        CAtlList<INotification *> temporaryList;<br>        EnterCriticalSection(&_NotificiationListLock)<br>        pos = _NotificationList.GetHeadList(block);<br>        while (notify = _NotificationList->GetNext(pos))<br>        {<br>            notify->AddRef();<br>            temporaryList.Add(notify);<br>        }<br>        LeaveCriticalSection(&_NotificationListLock);<br>        while (notify = temporaryList.RemoveHead())<br>        {<br>            notify->OnNotification(NotificationBlock);<br>            notify->Release();<br>        }<br>    }}
Unfortunately, this introduces a circular reference.  Since thing1 is added to the list in its final construct, the call to UnregisterNotifications never happens - the notification block keeps a reference to thing1 and the reference count never goes away.  Unless there's an external mechanism for removing thing1 from the container, neither object will ever be freed.
The easiest way of solving this is to introduce a tiny "delegator" object between thing1 and the notification manager.  The notification manager references the delegator, not the Thing1 object, and thus the lifetime of Thing1 is unaffected by the notification manager.
The good news is that managed environments, like the CLR make most of this reference counting stuff pretty much unnecessary, which is a very good thing.
Oh, and the fifth principle: "Reference counting is hard if you're not really careful".
So much for framework involving concurrency.  Believe it or not, I'm almost done :)  Tomorrow, I want to talk about opportunities for concurrency.
 
Edit: One of these days, I'll remember that your principal is your friend.

Comments

  • Anonymous
    February 21, 2005
    s/principal/principle

    If you do ever write a book get a good proofreader ;-)
  • Anonymous
    February 21, 2005
    The comment has been removed
  • Anonymous
    February 21, 2005
    regarding the 1st example,
    in Thing1::SetThing2, don't you need to release whatever was in this->_Thing2 before overwriting it & addref'ing the passed value? (same thing in Thing2::SetThing1). This is so you don't leak refs when these methods are called repeatedly.
    Also the AddRef could probably be moved before the Lock call in these methods, thereby reducing the time spent with the lock held.
    But then again, maybe i'm wrong, since ref count is hard ;-)


  • Anonymous
    February 22, 2005
    Alexey, I didn't bother to try to ensure that the examples compiled, to be honest :)

    jbn,
    you're right, I should have released the _Thing1 variable (if it was not null). And the addref could be moved before the lock is held (after changing the AddRef target to thing1). If you were addref'ing thing1 (as opposed to _Thing1) you could even move it AFTER releasing the lock (because the caller has a reference to thing1).

    On the other hand, the AddRef of _Thing1 has to happen within the lock - otherwise _Thing1 could be set to NULL between the LeaveCriticalSection and the call to Release.
  • Anonymous
    February 22, 2005
    Could you explain a little more about solving the circular reference issues? I'm sure many people have run into these sort of problems.
  • Anonymous
    February 22, 2005
    I have had to debug many problems of circular references, and to me deadlock problems and circular references are a lot alike.
    If you look at http://lists.ximian.com/archives/public/mono-patches/2003-August/023295.html, they explain how they model their lock hierarchy to never violate their lock order; now if you graph how your data structures point to each other and which pointers are owning references (i.e. AddRef'ed pointers), then it's pretty natural to think that when two items are in a refcount cycle, they "deadlock" as in "they lock each other in memory". Whether for mutexes or refs, no deadlocks means no cycles in the associated graph.. In any case, it should be possible to demonstrate via formal methods that a given program cannot deadlock: does this make any sense to you or am i off-topic already? ;-)
  • Anonymous
    February 22, 2005
    The comment has been removed
  • Anonymous
    February 23, 2005
    Alexey,
    You're right that it's a design problem. But it's indescribably easy for circular references to come into existance. That's one of the reasons that garbage collection is harder than just reference counting objects - you need to actively sweep through all allocated objects looking for cycles.

    Having said that, your comments are spot on.
  • Anonymous
    February 24, 2005
    The boost shared_ptr / weak_ptr libraries (coming soon to a C++ standard library near you) are a great way to handle this sort of circular dependency. If A has a strong pointer to B but B only has a weak pointer to A, no circularity is introduced (although B has to be aware that when it tries to access the A it might have gone away).

    Works really well for callbacks, and is well-enough designed that it's hard to use it wrongly.