Dela via


Debugging a classic ReaderWriterLock deadlock with SOSex.dll

I was helping out on an issue the other day where the process would stall if they added enough users in their load tests.  Btw, serious kudos to them for making load tests, so much nicer to work with a problem in test rather than when it is getting critical on a production machine.

We gathered some memory dumps with debug diag of the asp.net process (w3wp.exe) and found that most of the threads were waiting in this type of callstack: 

NOTE: I have changed a lot of function names and code snippets since it is not relevant for the post.

 0:071> !clrstack 

OS Thread Id: 0x21ac (71) 

ESP       EIP     
1c8ceb88 7c8285ec [HelperMethodFrame_1OBJ: 1c8ceb88] System.Threading.ReaderWriterLock.AcquireReaderLockInternal(Int32) 

1c8cebe4 1c880d72 DataStore.get_SomeItem() 

...

So they were sitting in DataStore.get_SomeItem, waiting to acquire a ReaderLock for some resource.  A ReaderWriterLock is used to synchronize access to resources, where you want to allow multiple readers at the same time, or one writer. 

To be honest, I can’t say that I see it used all that often, and now there is also a ReaderWriterLockSlim that is similar to the ReaderLock but designed to avoid some of the more common deadlocks.

Sosex from Steve’s Techspot has some really nice commands for reader writer locks that both display information about the RWlocks as well as detects deadlocks between them.

First we have !rwlock… this is similar to !syncblk for locks/monitors and here we can get some basic information about the readerwriterlocks in the process.

 0:174> !rwlock

Address    ReaderCount   WaitingReaderCount   WriterThread   WaitingWriterCount

---------------------------------------------------------------------------------------
…
0e9167d0             0                    0             50                    2

0e9167fc             2                   45              0                    1

We have two interesting locks here (0e9167d0, and 0e9167fc)

For 0e9167d0 (I’ll call this Lock1 in the future), thread 50 owns the WriterLock so no-one else can acquire neither a ReaderLock nor WriterLock for this, and we have 2 threads waiting to acquire the WriterLock on this.

For 0e9167fc (I’ll call this Lock2 in the future), we have two threads that own ReaderLocks on this, we have 1 thread waiting to acquire a WriterLock on it, and 45 other threads waiting to acquire a ReaderLock on it.  The thread waiting to acquire the WriterLock is probably first in queue, but until the current readers give up their locks, the WriterLock can’t be acquired.

(I will get back to these thread numbers in a moment because it gets pretty tricky)

We can also dig into a particular lock by running !rwlock <lock address>

 0:068> !rwlock 0e9167d0      (<- Lock1)

WriterThread:           50

WriterLevel:            1

WaitingWriterCount:     2

WriterEvent:            e54

WaitingWriterThreadIds: 14,47

ReaderCount:            0

CurrentReaderThreadIds: None

WaitingReaderCount:     0

ReaderEvent:            0

WaitingReaderThreadIds: None
 0:174> !rwlock 0e9167fc       (<-Lock2)    
WriterThread:           0

WriterLevel:            0

WaitingWriterCount:     1

WriterEvent:            e00

WaitingWriterThreadIds: 50

ReaderCount:            2

CurrentReaderThreadIds: 14,47

WaitingReaderCount:     45

ReaderEvent:            e7c

WaitingReaderThreadIds: 54,57,58,61,62,66,67,70,71,74,75,78,79,82,83,86,87,90,91,94,95,98,99,102,103,106,107,110,111,114,115,118,119,122,123,126,127,130,131,134,137,140,143,146,149

With this info, we can now gather that threads 14 and 47 own readerlocks for Lock2, but they won’t give them up until they can get WriterLocks for Lock1.   Thread 50 on the other hand, has a WriterLock on Lock1 and wont give it up until it can get the WriterLock on Lock2, hence we have a deadlock.

The good thing here is that if you don’t care to do this investigation on your own, SOSEX has a function that will do it for you called !dlk (deadlock).

 0:174> !dlk

Examining SyncBlocks...

Scanning for ReaderWriterLocks...

Scanning for lock holders on ReaderWriterLocks...

Scanning for threads waiting on SyncBlocks...

Scanning for threads waiting on ReaderWriterLocks...


Deadlock detected: 

CLR thread 14 holds a Reader lock on ReaderWriterLock 0e9167fc

             is waiting for a Writer lock on ReaderWriterLock 0e9167d0

CLR thread 50 holds a Writer lock on ReaderWriterLock 0e9167d0

             is waiting for a Writer lock on ReaderWriterLock 0e9167fc

CLR Thread 14 is waiting at DataStore.InitializeDataIfNotInitialized()(+0x0 IL)(+0x0 Native)

CLR Thread 50 is waiting at System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)(+0x0 IL)(+0x0 Native)
 

Deadlock detected: 

CLR thread 47 holds a Reader lock on ReaderWriterLock 0e9167fc

             is waiting for a Writer lock on ReaderWriterLock 0e9167d0

CLR thread 50 holds a Writer lock on ReaderWriterLock 0e9167d0

             is waiting for a Writer lock on ReaderWriterLock 0e9167fc

CLR Thread 47 is waiting at DataStore.InitializeDataIfNotInitialized()(+0x0 IL)(+0x0 Native)

CLR Thread 50 is waiting at System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)(+0x0 IL)(+0x0 Native) 

2 deadlocks detected. 

Explanation of the Thread IDs coming from the SOSEX RWLock methods

So this is all nice and good, but what are these thread IDs? is it the logical thread ID (1st column in !threads), the CLR ID (2nd column in !threads) or the OSID (OS Thread ID)?  Turns out it is neither…  or rather it is actually the CLR ID in decimal form:)  Steve mentioned he might change this later to be the hex value so stay tuned for new updates of SOSEX for that and other functionality.

 0:054> !threads

ThreadCount: 154

UnstartedThread: 1

BackgroundThread: 103

PendingThread: 0

DeadThread: 1

Hosted Runtime: no

                                      PreEmptive   GC Alloc           Lock

       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception

  16    1 1474 0010c0a8   3808220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

...

  32    d 2520 1abd0948   200b020 Enabled  00000000:00000000 0013fa58     0 MTA

  11    e 1fcc 1abdc9e8   a80a220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Completion Port)

  33    f 1224 1abe5c10   380b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

...

  37   13 1cc8 1ac22618   380b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

  38   14 23e8 1ac229e8   200b020 Enabled  00000000:00000000 0013fa58     0 MTA

  39   15 2230 1ac22db8   380b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

...

  63   2d 23e0 1ab934c0   380b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

  64   2e 19ac 1ac3d190   200b020 Enabled  00000000:00000000 0013fa58     0 MTA

  65   2f 1644 1aba8f40   a80b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Completion Port)

  66   30 10f4 1abaaaa8   380b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Worker)

  67   31 121c 1ac47418   200b020 Enabled  00000000:00000000 0013fa58     0 MTA

  68   32 1b00 1abfdd70   a80b220 Enabled  00000000:00000000 0013fa58     1 MTA (Threadpool Completion Port)

  69   33 2198 1ac55358   200b220 Enabled  06e140e4:06e150b4 0013fa58     0 MTA

In our case, the threads we were interested in were 14 (0xe) = 11, 47 (0x2f) = 65 and 50 (0x32) = 68

If we take a look at these threads (11 and 68), the stacks look like this:

 0:011> !clrstack

OS Thread Id: 0x1fcc (11)

ESP       EIP     
01c7f250 7c8285ec [HelperMethodFrame_1OBJ: 01c7f250] System.Threading.ReaderWriterLock.AcquireWriterLockInternal(Int32)

01c7f2ac 1c880e07 DataStore.InitializeDataIfNotInitialized()

01c7f2e4 1c880d78 DataStore.get_SomeItem()

... 


0:068> !clrstack

OS Thread Id: 0x1b00 (68)

ESP       EIP     
1c71ed0c 7c8285ec [GCFrame: 1c71ed0c] 

1c71ed28 7c8285ec [HelperMethodFrame_1OBJ: 1c71ed28] System.Threading.ReaderWriterLock.FCallUpgradeToWriterLock(System.Threading.LockCookie ByRef, Int32)

1c71ed88 792af05d System.Threading.ReaderWriterLock.UpgradeToWriterLock(Int32)

1c71edac 1c880e19 DataStore.InitializeDataIfNotInitialized()

1c71ede4 1c880d78 DataStore.get_SomeItem()

... 

And the code for the get_SomeItem, and InitializeDataIfNotInitialized methods look something like this:

public static SomeItem get_SomeItem()

{

    ...

    DataLock.AcquireReaderLock(timeOut);

    try

    {

        InitializeDataIfNotInitialized();

        ...

    }

    finally

    {

        DataLock.ReleaseReaderLock();

    }

    ...

}

private static void InitializeDataIfNotInitialized()

{

    if (DataIsNotInitialized)

    {

        try

        {

            InitLock.AcquireWriterLock(timeOut);

            LockCookie lockCookie = DataLock.UpgradeToWriterLock(timeOut);

            try

            {

                ...

            }

            finally

            {

                DataLock.DowngradeFromWriterLock(ref lockCookie);

            }

        }

        finally

        {

            InitLock.ReleaseWriterLock();

        }

    }

}

So what happened here was that thread 11 and 68 both grabbed the DataLock ReaderLock in get_SomeItem.

Then 68 grabbed the WriterLock for InitLock in InitializeDataIfNotInitialized, and now it is trying to upgrade the DataLock to a WriterLock but can’t do so since thread 11 has the ReaderLock on it.

Thread 11 on the other hand now needs to grab the WriterLock on the InitLock but can’t do so because 68 owns it.

Conclusion

This is a pretty typical deadlock situation, and the solution is to re-design the initialization methods so that this can never happen, but hopefully the walkthrough has given some insight into how you can debug these types of issues with SOSEX if you do run into them.

Have a good one,

Tess

Comments

  • Anonymous
    April 27, 2010
    Thank you for the write up. It was Neatly Explained, Simple and short to the point. Regards

  • Anonymous
    April 26, 2011
    I'm wondering why didn't the 'timeout' work.

  • Anonymous
    July 23, 2012
    Hi, i try to investigate a hung with windbg. If I call the command !mlocks i got the following :000> !mlocks Examining SyncBlocks... Scanning for ReaderWriterLock instances... Scanning for holders of ReaderWriterLock locks... Scanning for ReaderWriterLockSlim instances... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... ClrThread  DbgThread  OsThread    LockType    Lock              LockLevel


0x640064   -1         0xffffffff  RWLock      000000000339a338  Writer         0x6        7          0x1ea8      thinlock    000000000343ddd8  (recursion:0) When executeing rwlocks i got the following: 0:000> !rwlocks 000000000339a338           No export rwlocks found 0:000> !rwlock 000000000339a338           WriterThread:           0x640064 (DEAD) WriterLevel:            115 WaitingWriterCount:     0 WriterEvent:            0 WaitingWriterThreadIds: None ReaderCount:            116 CurrentReaderThreadIds: WaitingReaderCount:     576 ReaderEvent:            80400002 WaitingReaderThreadIds: *This lock has 116 orphaned reader locks. 0:007> !rwlock Address            ReaderCount   WaitingReaderCount     WriterThread   WaitingWriterCount ... 000000000339a338           116                  576            0x640064                    0 ... 00000000053f0688           568                  499            --                       6 ... i got this. When I call 0:000> !dlk Examining SyncBlocks... Scanning for ReaderWriterLock instances... Scanning for holders of ReaderWriterLock locks... Scanning for ReaderWriterLockSlim instances... Scanning for holders of ReaderWriterLockSlim locks... Examining CriticalSections... Scanning for threads waiting on SyncBlocks... Scanning for threads waiting on ReaderWriterLock locks... Scanning for threads waiting on ReaderWriterLocksSlim locks... Scanning for threads waiting on CriticalSections... No deadlocks detected. then no deadlock will be detected. My question is if this threads have to do with my hunging application, and what the scenario could be. What does it mean that the thread is DEAD. Or do I have to find somewhere else to find out the root cause of my hunging application? Please help me to interpret this output. Regards, Bernhard