Concurrency, part 5 - What happens when you can't avoid lock misordering?

[Yesterday, I talked about simple deadlocks.  In general, deadlocks are caused by violating lock ordering.  But what do you do if you can't avoid violating lock ordering?
It does happen - in my experience, the most common place where this can occur is when you have a list of objects, where the list is protected by a different lock from the objects.
So you have (yeah, I know it's a stupid example, but...):class Thing1 <br>{<br>    CriticalSection _Thing1Lock;<br>    Thing1Container *_MyContainer;<br>public:<br>    void Lock() { EnterCriticalSection(&_Thing1Lock); }<br>    void Unlock() { LeaveCriticalSection(&_Thing1Lock); }<br>    void Thing1Method1()<br>    {<br>        Lock();<br>        ...<br>        Unlock();<br>    }<br>    ~Thing1()<br>    {<br>        Lock();<br>        _MyContainer->RemoveThing1FromList(this);<br>        Unlock();<br>    }<br>};class Thing1Container<br>{<br>    CAtlList<Thing1 *> _Thing1List;<br>    CRITICAL_SECTION _Thing1ListLock;<br>    void EnumerateThing1()<br>    {<br>        POSITION pos;<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.GetHeadPosition();<br>        while (!pos)<br>        {<br>            Thing1 *thing1 = _Thing1List.GetNext(pos);<br>            thing1->DoSomeWork();<br>        }<br>        LeaveCriticalSection(&_Thing1ListLock);<br>    }<br>    AddThing1ToList(Thing1 *Thing1)<br>    {<br>        EnterCriticalSection(&_Thing1ListLock);<br>        _Thing1List.Add(Thing1);<br>        LeaveCriticalSection(&_Thing1ListLock);<br>    }<br>    RemoveThing1FromList(Thing1 *Thing1)<br>    {<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.Find(Thing1);<br>        _Thing1List.RemoveAt(pos);<br>        LeaveCriticalSection(&_Thing1ListLock);<br>    }<br>}
If the Thing1 class also has its own lock, or if the call to thing1->DoSomeWork() may take a "long" time (for some value of long that depends on your application), then holding the lock across the call to DoSomeWork can be a major issue.
So the obvious change is to release the lock across the call to thing1->DoSomeWork().
    void EnumerateThing1()<br>    {<br>        POSITION pos;<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.GetHeadPosition();<br>        while (!pos)<br>        {<br>            Thing1 *thing1 = _Thing1List.GetNext(pos);<br>            LeaveCriticalSection(&_Thing1ListLock);<br>            thing1->DoSomeWork();<br>            EnterCriticalSection(&_Thing1ListLock);<br>        }<br>    }<br> 
Unfortunately, now you have an even bigger problem.  The instant you left the critical section, another thread could have come along and destroyed thing1.  Normally, the call to destroy thing1 would have blocked on the _Thing1ListLock, but once you exited the critical section, the other thread could run and destroy the object.
The immediate solution to this problem is to add reference counting to thing1. 
class Thing1 <br>{<br>    CriticalSection _Thing1Lock;<br>    LONG _ReferenceCount;<br>    Thing1Container *_MyContainer;<br>public:<br>    LONG AddRef() { return InterlockedIncrement(&_ReferenceCount); }<br>    LONG Release() <br>    { <br>        LONG lRet = InterlockedDecrement(&_ReferenceCount); <br>        if (lRet == 0)<br>        {<br>            delete this;<br>        }<br>        return lRet;    <br>    }<br>    void Lock() { EnterCriticalSection(&_Thing1Lock); }<br>    void Unlock() { LeaveCriticalSection(&_Thing1Lock); }<br>    void Thing1Method1()<br>    {<br>        Lock();<br>        ...<br>        Unlock();<br>    }<br>    ~Thing1()<br>    {<br>        Lock();<br>        _MyContainer->RemoveThing1FromList(this);<br>        Unlock();<br>    }<br>};
Next, you want to change EnumerateThing1 to:
    void EnumerateThing1()<br>    {<br>        POSITION pos;<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.GetHeadPosition();<br>        while (!pos)<br>        {<br>            Thing1 *thing1 = _Thing1List.GetNext(pos);<br>            thing1->AddRef();<br>            LeaveCriticalSection(&_Thing1ListLock);<br>            thing1->DoSomeWork();<br>            EnterCriticalSection(&_Thing1ListLock);<br>            thing1->Release();<br>        }<br>    }
You also need to change AddThing1ToList:
    AddThing1ToList(Thing1 *Thing1)<br>    {<br>        EnterCriticalSection(&_Thing1ListLock);<br>        Thing1->AddRef();<br>        _Thing1List.Add(Thing1);<br>        LeaveCriticalSection(&_Thing1ListLock);<br>    }<br>    RemoveThing1FromList(Thing1 *Thing1)<br>    {<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.Find(Thing1);<br>        _Thing1List.RemoveAt(pos);<br>        Thing1->Release();<br>        LeaveCriticalSection(&_Thing1ListLock);<br>    }
Note that I'm using InterlockedIncrement and InterlockedDecrement for the reference counting.  That's because InterlockedIncrement and InterlockedDecrement don't require an additional lock - they atomically increment or decrement the variable by using low level CPU instructions that avoid the need for a separate lock.  Also, when adding reference counting to an object, don't forget to make the destructor of that object private (or protected, if it's virtual) - that way you don't accidentally delete the object.
I was also careful to add a reference to Thing1 when it was put on the list.  This guarantees that thing1 won't be destroyed when we call thing1->Release() in EnumerateThing1 - otherwise we could have a lock inversion.
Now there's still a problem with the enumeration function above - it's related to the use of the "pos" variable.  If another thread called AddThing1ToList (or, even worse, RemoveThing1FromList) after the lock was released, then the "pos" variable is invalidated!  It doesn't even have to be a remove call for the current item in the enumeration,  the "pos" variable in EnumerateThing1 is only valid as long as no thread has modified the list.  So you've just introduced a data corruption into your application by releasing the lock.  So you need to be careful to ensure that your list doesn't get corrupted.
There are a several of ways of handling that - my current favorite is to have a temporary list and stage the thing1's in the temporary list:
    void EnumerateThing1()<br>    {<br>        POSITION pos;<br>        CAtlList<Thing1 *> callbackList;<br>        EnterCriticalSection(&_Thing1ListLock);<br>        pos = _Thing1List.GetHeadPosition();<br>        while (!pos)<br>        {<br>            Thing1 *thing1 = _Thing1List.GetNext(pos);<br>            thing1->AddRef();<br>            callbackList.Add(thing1);<br>        }<br>        LeaveCriticalSection(&_Thing1ListLock);<br>        pos = callbackList.GetHeadPosition();<br>        while (thing1 = callbackList.RemoveHead())<br>        {<br>            thing1->DoSomeWork();<br>            thing1->Release();<br>        }<br>    }
This solution applies my first principal: I ensured that the data was never accessed on more than one thread.  There are other solutions, this is just the easiest - it has its own issues, since it builds a temporary copy of the list (and if the list is long, that may involve a lot of memory).  Btw, this solution is essentially the one that the C# uses with the foreach construct - a temporary, immutable copy of the list is made and the caller iterates over that temporary copy of the list (CLR purists will disagree with me on this one, and they're right, but at a 20,000 foot level, that's essentially what happens).
By now, my fourth principal of concurrency should be pretty clear: "Don't call into other objects with locks being held, and make sure that you reference count all your objects" .
In general, callbacks can be VERY tricky, and usually require that the callback function clearly describe their semantics.  For example, the

](https://weblogs.asp.net/larryosterman/archive/2005/02/17/375449.aspx)waveOutProc function passed in to waveOutOpen has the following caveat:

Applications should not call any system-defined functions from inside a callback function, except for EnterCriticalSection, LeaveCriticalSection, midiOutLongMsg, midiOutShortMsg, OutputDebugString, PostMessage, PostThreadMessage, SetEvent, timeGetSystemTime, timeGetTime, timeKillEvent, and timeSetEvent. Calling other wave functions will cause deadlock.

This caveat exists because winmm doesn't release some internal locks across callback functions and thus can deadlock easily.

Tomorrow, I'll talk a bit about some of the other issues associated with reference counting - there are some a really subtle bugs that can happen if you're not careful when using reference counts.

Edit: Don't call DoSomeWork in the loop :)

Comments

  • Anonymous
    February 18, 2005
    The comment has been removed
  • Anonymous
    February 18, 2005
    Adi, it IS a stupid example :)

    But sometimes stuff like that happens (I've seen it more times than I want to). Usually, there's a local variable in Thing1 that flags that Thing1's been put into _MyContainer - you take the Thing1 lock to check the "I'm in _MyContainer" condition, then you remove Thing1 from _MyContainer.

    I did say I was winging this :) I'm sure that if I spent enough time, I could come up with a real world example, but...
  • Anonymous
    February 18, 2005
    Starting to get interesting. Thanks for taking the time to put this up.
  • Anonymous
    February 18, 2005
    Larry,

    In the last version of EnumerateThing1(), did you mean to call thing1->DoSomeWork(); in the first while loop as well? I think it should be only in the second while loop.

    (Also, you managed to confuse "principle" and "principal" again! ;)

    -K
  • Anonymous
    February 18, 2005
    Did you mean to call thing1->DoSomeWork() in the last listing while building callbackList?
  • Anonymous
    February 18, 2005
    Well... OK.

    My (indirect) point is that in a real world example, you should never have a situation when the lock hierarchy is not well defined. If you have a mixup, then you have a bug in the design of the original code.

    So, the right way to fix it is to examine the locking dependencies, and what introduced them in the first place.

    That said, you might have "strong" dependencies and "weak" dependencies, but you should always have a lock hierarchy...

    I would add one more point - when you design a lifecycle management for certain objects, usually the simplest approach is to define some sort of reference counters to simplify the memory management. It is a similar strategy as with COM objects - each thread should have its own reference. But there are subtleties there too - you again need to distinguish between strong and weak references when you have cross-reference between two objects...



  • Anonymous
    February 18, 2005
    Did you mean to call thing1->DoSomeWork() will building callbackList?
  • Anonymous
    February 18, 2005
    Did you mean to call thing1->DoSomeWork() will building callbackList?
  • Anonymous
    February 18, 2005
    The comment has been removed
  • Anonymous
    February 19, 2005
    The comment has been removed
  • Anonymous
    February 20, 2005
    James,
    It doesn't matter - the critical section is there to protect some data.

    Now, if the data in question was a kernel object, duplicating the handle doesn't change the issue - a handle is a pointer to some kernel data structure, which has its own protection structures.