Graceful Completion of Thread Pool Wait Callback

Windows thread pool API helps the developer to deal with the burdens of thread lifetime and management, so that he/she only needs to focus on writing his/her work code as a callback function, and let it be automatically invoked by the thread pool thread. Books, such like [CPW08] chapter 7 and [WVCC07] chapter 11, and articles from MSDN magazine, such like [MSDN07/10] and [MSDN11/08] - [MSDN11/11] have details on various thread pool API.

 

Resource cleanup and Completion of Thread Pool Callbacks

In order for the callback function to perform user defined tasks or works, it often has to use some custom resources, and those resources normally requires proper cleanup when the callback function is no longer needed, things like memories to be released, handles to be closed or DLL to be unloaded. Because of the concurrent nature of thread pool API, the resource cleanup and completion of thread pool callbacks must have some sort of synchronization, otherwise there would be a race condition between the thread which is cleaning up the resources and the callback threads which are still using those resources.

In particular, SetThreadpoolWait function allows a thread pool worker thread to call the wait object’s callback function after the handle becomes signaled (or timeout happens). If the program wants to listen to the event notification continuously, it must re-register (by calling SetThreadpoolWait again) the event with the wait object before signaling it each time to trigger the wait callback . Usually, the re-register logic is put inside the callback function itself.

VOID MyWaitCallback (…)
{
    //user codes to perform necessary tasks or works;
    …;

     //using some resources;
    …;

    //re-register another wait
    SetThreadpoolWait(Wait, myWaitHandle, …); (1.1)
};

To overcome the race condition between any potential in-flight callbacks and the resource cleanup, the key is to make sure that the code waits for all callbacks to complete before starts freeing any resources. [CPW08] chapter 7 gives a three-step process to achieve this for wait callbacks: (1) cancel the waits, (2) wait for callbacks to finish, and finally (3) close the wait object

VOID StopTheWait (…)
{

     //Step 1: cancel the wait.
    SetThreadpoolWait(myWait, NULL, NULL); (1.2)
    //Step 2: wait. m_pNduByteLimitWait(NULL),
    WaitForThreadpoolWaitCallbacks(myWait, FALSE); (1.3)
    //Step 3: close the wait object.
    CloseThreadpoolWait(myWait); (1.4)

    //now it’s safe to clean up resources, freeing the resource
    …;
};

However, this solution is not completely correct and it does not solve all race conditions. Here it is why:

Imaging there is a in-flight callback, which is about to execute line (1.1) , and it gets preempted. At the same time, the program decides to stop the thread pool wait and start the cleaning up, so line (1.2) gets executed. After that the OS does context switch back to the callback thread, and executes line (1.1) . The net effect of these interleaving operations is that line (1.1) happens after line (1.2) , and the wait is in fact not being canceled. Therefore, additional callbacks could be queued after line (1.3) , and start executing before line (1.4) . In this case, the code moves on to free the resources concurrently as the callback executes.

 

Graceful completion of thread pool wait callback

Here are two approaches for stopping thread pool wait in a truly graceful manner:

If the program have control over the source of the wait notification

In this case, the code can just simply turn off the source of the notification first, before waiting for any in-flight/queued callbacks to be finished. For example, it might call the matching UnregisterXXX function to stop the wait event from signaling, or stop calling SetEvent function if the signal is generated by the code itself. After than, it can call WaitForThreadpoolWaitCallbacks and CloseThreadpoolWait to stop thread pool wait safely.

If the program does NOT have control over the source of the wait notification

Neeraj Singh proposed the following pattern:

VOID MyWaitCallback (…)
{
    …;

    if (dontReregister == FALSE)
        SetThreadpoolWait(Wait, myWaitHandle, …); (2.1)
};

VOID StopTheWait (…)
{

    //Step 1: set “don’t re-register” flag.
    dontReregister = TRUE; (2.2)
    //Step 2: wait for any in-flight callbacks which have NOT observed the flag to complete
    WaitForThreadpoolWaitCallbacks(myWait, TRUE); (2.3)
    //Step 3: call the wait.
    SetThreadpoolWait(myWait, NULL, NULL); (2.4)
    //Step 4: wait for any additional callbacks which should have observed the flag
    WaitForThreadpoolWaitCallbacks(myWait, TRUE); (2,5)
    //Step 3: close the wait object.
    CloseThreadpoolWait(myWait); (2.6)

    //now it’s safe to clean up resources, freeing the resource
    …;
};

The reason this pattern works is as following:

The if statement before line (2.1) makes sure that dontReregister flag is observed by any running callbacks before it calls SetThreadpoolWait. When the code decides to stop the thread pool wait, it first sets the dontReregister flag at line (2.2) . The first WaitForThreadpoolWaitCallbacks at line (2.3) ensures that no callbacks are still running which have NOT observed that flag. After that, line (2.4) prevents any further callbacks from being queued from this point onwards. The second WaitForThreadpoolWaitCallbacks at line (2.5) is added to wait for/cancel any in-flight/queued callbacks which may sneak in between line (2.3 and line(2.4) . Finally, the wait object is closed at line (2.6) .

Windows Thread Pool API already has necessary memory barriers inside, so the re-ordering of memory access has been constrained at line (2.1)  and line (2.3) even if nothing special is done for dontReregister flag.

 

Take Away:

On the surface, multithreading seems a straightforward and simple programming exercise and Windows Thread Pool API makes doing it even more accessible, but the truth is that it is extremely hard to get it done right. The bugs resulted from it are tricky and it’s difficult for the programmers to locate the cause because of the timing involved, even with substantial test effect. Furthermore, the consequences of this kind of bugs are usually undefined (which often means devastating).

There are many other deeper aspects of multithreading, such like lock performance, memory/cache synchronization, power consumption and more, to be understood by the developers before they can really enjoy the benefit of multithreading. In short, there is a huge risk involved in adopting multithreading. Hence, before starting, carefully judge the cost/benefit and use it responsibly

 

References:

1. [CPW08] Concurrent Programming On Windows

2. [WVCC07] Windows via C/C++

3. [MSDN07/10] Improve Scalability With New Thread Pool APIs

4. [MSDN11/08] The Windows Thread Pool and Work

5. [MSDN11/09] The Thread Pool Environment

6. [MSDN11/10] Thread Pool Cancellation and Cleanup

7. [MSDN11/11] Thread Pool Synchronization

 

Thanks to Neeraj Singh and Alex Bendetov for answering my questions on this topic