Tracking Down a Multi-Process Deadlock
Hello, my name is Louis Shanks and my story starts off with a tricky little deadlock issue that I worked on recently. Let me set the stage by explaining the surface level symptom, and then we will get to the good stuff. Randomly when users running Vista would attempt to launch Internet Explorer nothing would happen. So they would again double-click the icon, and still nothing happened. So the users would reboot and go back to their web activities.
After some basic trouble shooting it was discovered that Iexplore.exe was running multiple times and every attempt to open the browser would result in another Iexplore.exe instance. Since no UI was ever presented I went ahead and collected a few Iexplore.exe process dumps along with a complete system dump.
The process dumps didn’t prove to be terribly useful so I decided to move over to the complete memory dump.
Since I knew we had multiple IExplore.exe processes running I wanted to investigate them first.
0: kd>!process 0 0
<snip>
PROCESS 87289d90 SessionId: 5 Cid: 0fdc Peb: 7ffdc000 ParentCid: 0968
DirBase: 7d66c520 ObjectTable: a203a178 HandleCount: 205.
Image: iexplore.exe
PROCESS 861a9d90 SessionId: 5 Cid: 0d34 Peb: 7ffd9000 ParentCid: 0fdc
DirBase: 7d66c7a0 ObjectTable: a5651a98 HandleCount: 170.
Image: ieuser.exe
PROCESS 86c175b0 SessionId: 5 Cid: 1250 Peb: 7ffd9000 ParentCid: 0968
DirBase: 7d66c940 ObjectTable: a0853f10 HandleCount: 91.
Image: iexplore.exe
PROCESS 861ac7a8 SessionId: 5 Cid: 1024 Peb: 7ffde000 ParentCid: 0968
DirBase: 7d66c620 ObjectTable: a83f7898 HandleCount: 91.
Image: iexplore.exe
</snip>
I decided to just start with the first one in the list (87289d90)
0: kd> !process 87289d90
Image: iexplore.exe
ElapsedTime 00:24:06.666
It appears this process has been around a while, so I wanted to look at the first thread in this process as it is responsible for the UI in Internet Explorer. I am looking at this thread because the surface level problem illustrates that when we launch Iexplore.exe the UI is never displayed. Here I used the !thread command to display the specific thread.
THREAD 870b6398 Cid 0fdc.16a4 Teb: 7ffdf000 Win32Thread: fd850b78 WAIT: (UserRequest) UserMode Non-Alertable
85e86cf0 SynchronizationEvent
85cb35e0 SynchronizationEvent
Not impersonating
Attached Process 87289d90 Image: iexplore.exe
Wait Start TickCount 10817197 Ticks: 69571 (0:00:18:05.314)
UserTime 00:00:00.015
KernelTime 00:00:00.093
Based on the tick count this thread has been waiting for 18 minutes.
0: kd> kvn
# ChildEBP RetAddr Args to Child
00 931208d8 81a940a1 870b6398 803d2120 870b6420 nt!KiSwapContext+0x26
01 9312091c 81a8dacc 870b6398 00000000 00000002 nt!KiSwapThread+0x44f
02 93120970 81bedb4f 00000002 93120aa8 00000001 nt!KeWaitForMultipleObjects+0x53d
03 93120bfc 81bee2e3 00000002 00000001 00000000 nt!ObpWaitForMultipleObjects+0x256
04 93120d48 81a30b3a 00000002 0027e9c4 00000001 nt!NtWaitForMultipleObjects+0xcc
05 93120d48 77568254 00000002 0027e9c4 00000001 nt!KiFastCallEntry+0x12a
06 0027e970 775679fc 7708c973 00000002 0027e9c4 ntdll!KiFastSystemCallRet
07 0027e974 7708c973 00000002 0027e9c4 00000001 ntdll!NtWaitForMultipleObjects+0xc
08 0027ea10 7608ff95 0027e9c4 0027ea38 00000000 kernel32!WaitForMultipleObjectsEx+0x11d
09 0027ea64 773d3a46 00000028 0027eaac ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x13c
0a 0027ea8c 773d3b7f 0027eaac ffffffff 0027eabc ole32!CCliModalLoop::BlockFn+0x97
0b 0027eab4 774e75be ffffffff 003ddf90 0027ebc0 ole32!ModalLoop+0x5b
0c 0027ead0 774e9294 00000000 0027ebd4 00000000 ole32!ThreadSendReceive+0x12c
0d 0027eaf8 774e9100 0027ebc0 003d3538 0027ec1c ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x194
0e 0027ebd8 773d3c25 003d3538 0027ecfc 0027ece4 ole32!CRpcChannelBuffer::SendReceive2+0xef
0f 0027ebf4 773d3bd7 0027ecfc 0027ece4 003d3538 ole32!CCliModalLoop::SendReceive+0x1e
10 0027ec6c 77403efb 003d3538 0027ecfc 0027ece4 ole32!CAptRpcChnl::SendReceive+0x73
11 0027ecc0 76052ff4 003d3538 0027ecfc 0027ece4 ole32!CCtxComChnl::SendReceive+0x1c5
12 0027ecd8 76053055 003e50a4 0027ede4 76052e0d RPCRT4!NdrProxySendReceive+0x43
13 0027ece4 76052e0d 09ea0d59 0027f1fc 070001f3 RPCRT4!NdrpProxySendReceive+0xc
Based on the call stack it appears this thread made an OLE call and is now waiting for some objects to signal.
What is IExplore doing in this thread? It is waiting on another process to service this OLE request. We can find the PID of the other process by digging into the ole32 SendReceive call.
I looked at the first argument:
0: kd> dd 003d3538
003d3538 77403a50 773f57b4 00000003 00000002
003d3548 00000000 00000000 003d2ef0 003d8840
003d3558 0038b1c0 003e05f8 77403980 00070005
This gives us a pointer to another data structure that contains the information we are attempting to locate.
0: kd> dd 003d2ef0
003d2ef0 774ee600 003d2e70 00000d34 00000000
003d2f00 862941fa c94cde09 55a88424 801a2601
003d2f10 55a88424 801a2601 0000c400 00000d34
Ah there it is! 0d34 is the process id servicing our OLE request.
I went back and reviewed the !process 0 0 output to locate the process containing PID 0D34. I’m dumping it here.
PROCESS 861a9d90 SessionId: 5 Cid: 0d34 Peb: 7ffd9000 ParentCid: 0fdc
DirBase: 7d66c7a0 ObjectTable: a5651a98 HandleCount: 170.
Image: ieuser.exe
Ok, so the process handing this request is IEUSER.exe, which is the broker process used by Internet Explorer in Vista when running in protect mode. This is new in Vista.
More on IEUser and Protected mode can be seen at the following location:
https://msdn.microsoft.com/en-us/library/bb250462.aspx
Now I needed to dig into the IEUser.exe process to find out what it was working on causing Iexplore to wait.
0: kd> !process 861a9d90
PROCESS 861a9d90 SessionId: 5 Cid: 0d34 Peb: 7ffd9000 ParentCid: 0fdc
DirBase: 7d66c7a0 ObjectTable: a5651a98 HandleCount: 170.
Image: ieuser.exe
I discovered the thread in the ieuser.exe process working on our call from Iexplore.exe. However this thread is also waiting on access to something before it can complete.
0: kd> kvn+200
# ChildEBP RetAddr Args to Child
00 a41b0c50 81a940a1 85ecbac0 85ecbb48 81af9920 nt!KiSwapContext+0x26
01 a41b0c94 81a2ddc8 85ecbac0 00000000 85e18510 nt!KiSwapThread+0x44f
02 a41b0ce8 81bf0d9f 85e18510 00000006 00000001 nt!KeWaitForSingleObject+0x492
03 a41b0d50 81a30b3a 00000224 00000000 00000000 nt!NtWaitForSingleObject+0xbe
04 a41b0d50 77568254 00000224 00000000 00000000 nt!KiFastCallEntry+0x12a
05 0163e8a4 77567a0c 7708c7c4 00000224 00000000 ntdll!KiFastSystemCallRet
06 0163e8a8 7708c7c4 00000224 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
07 0163e918 7708c732 00000224 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe
08 0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
09 0163e948 747e5fe0 0163ec48 00497b38 00000000 RASAPI32!ReadPhonebookFileEx+0x1a4
0a 0163e968 747e4e9d 0163ec48 00000000 00000000 RASAPI32!ReadPhonebookFile+0x1b
0b 0163e9a8 747e52d3 0163ec48 0048f660 0163e9e0 RASAPI32!DwEnumEntriesFromPhonebook+0x35
0c 0163ee58 747e4fd4 0163ee9c 00000001 0048f660 RASAPI32!DwEnumEntriesInDir+0x1cc
0d 0163f0ac 747e677e 00000001 0048f660 0163f0f4 RASAPI32!DwEnumEntriesForPbkMode+0xab
0e 0163f0e0 76e534b9 00000000 00004180 0048f660 RASAPI32!RasEnumEntriesW+0xb0
0f 0163f10c 76e5342a 76ecc2dc 00000000 00000000 WININET!RasEnumHelp::RasEnumHelp+0x55
10 0163f120 76e533a3 00000001 00000000 00000001 WININET!DoConnectoidsExist+0x2b
11 0163f14c 76e534ff 76ecf02c 76ecc2dc 00000000 WININET!GetRasConnections+0x34
12 0163f168 76e4196e 00000000 0163f180 6ee3228c WININET!IsDialUpConnection+0xa9
13 0163f184 76e70564 00000001 00000000 00000000 WININET!FixProxySettingsForCurrentConnection+0x31
14 0163f1a8 6ec2db3b 00000000 0163f1bc 0163f3d4 WININET!InternetInitializeAutoProxyDll+0x6c
15 0163f1b8 75fe9142 004884a0 00000202 0047e900 ieframe!CShdocvwBroker::StartAutoProxyDetection+0x4d
16 0163f1d0 76054294 6ec2daf8 0163f3d8 00000001 RPCRT4!Invoke+0x2a
17 0163f5fc 76054a49 004858d0 0046e4e0 0047e900 RPCRT4!NdrStubCall2+0x27b
18 0163f64c 774e86d7 004858d0 0047e900 0046e4e0 RPCRT4!CStdStubBuffer_Invoke+0xa0
19 0163f694 774e877d 0047e900 0047f968 00480b38 ole32!SyncStubInvoke+0x3c
1a 0163f6e0 7740249e 0047e900 0047f860 004858d0 ole32!StubInvoke+0xb9
1b 0163f7bc 774023af 0046e4e0 00000000 004858d0 ole32!CCtxComChnl::ContextInvoke+0xfa
1c 0163f7d8 774e85df 0047e900 00000001 004858d0 ole32!MTAInvoke+0x1a
1d 0163f808 774e8a18 d0908070 0046e4e0 004858d0 ole32!AppInvoke+0xaa
1e 0163f8e4 774e85ef 0047e8a8 00474118 0047ebc0 ole32!ComInvokeWithLockAndIPID+0x32c
1f 0163f930 75fe94b0 0047ebc0 08ac70b7 0047ebc0 ole32!ThreadInvoke+0x2fd
I’m dumping the first parameter passed to KeWaitForSingleObject to determine the object type.
0: kd> !object 85e18510
Object: 85e18510 Type: (85484d40) Mutant
ObjectHeader: 85e184f8 (old version)
HandleCount: 3 PointerCount: 5
Directory Object: 881f3030 Name: RasPbFile
Ok, we are waiting on a Mutex named RasPbFile. The big question in your mind now is, “who owns this lock”, right?
To determine this I’ll dump the structure for the mutex object:
0: kd> dt _KMUTANT 85e18510
nt!_KMUTANT
+0x000 Header : _DISPATCHER_HEADER
+0x010 MutantListEntry : _LIST_ENTRY [ 0x870b6564 - 0x870b6564 ]
+0x018 OwnerThread : 0x870b6398 _KTHREAD
+0x01c Abandoned : 0 ''
+0x01d ApcDisable : 0 ''
The owner of the mutex is 870b6398. The bad news is this is the same thread I started with in the Iexplore.exe process so we’re looking at a classic deadlock scenario. Thread 870b6398 running under Iexplore.exe has made an OLE call over to the IEuser.exe broker process. The IEuser process needs the mutex owned by the thread that called it in the first place.
This turned out to be a bug in the RasApi code, not the browser or IEuser process. The RasAPI code was written in such a way that it would wait for an infinite period of time to gain the Mutex.
0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12
The good news is that we have created hotfix 951738 to address this issue. The official KB article is still pending.
Comments
Anonymous
July 02, 2008
The most interesting part is missing :) Would be nice to have a declaration of SendReceive (which is, I believe, undocumented) and the content of the structure which was used by you to obtain the PID.Anonymous
July 02, 2008
When "digging into the ole32 SendReceive call" you chose those addresses to dump because you knew from the source code the parameters passed to that function? [You are correct; we knew how to find the process ID because the source code gave us the needed details. However, the technique itself can still be useful without source code, although we cannot guarantee that the function parameter will remain the same in other versions of ole32.dll.]Anonymous
July 09, 2008
The list is a little longer today because of not posting last week. Enjoy! Microsoft Advanced WindowsAnonymous
July 09, 2008
To Lucian: given the stack trace of IE, I would start to look at ole32!CCliModalLoop::BlockFn source code to obtain the name of the object which is going to be signaled. Then search for this mutex name over all sources and, voila, you know the reason for deadlock.Anonymous
July 11, 2008
There is a simpler method for RPC calls (illustrated from another dump) to use dpp and search for small numbers: 00faf7e4 77c7fcf0 00133620 00faf80c 7778d073 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [Non-Fpo]) 1: kd> dpp 00133620 00133620 072bb008 77c7e5d8 RPCRT4!LRPC_CCALL::`vftable' 00133624 00000010 00133628 0738dd90 00070005 0013362c 00000038 00133630 00000005 00133634 0771be9c 8a885d04 00133638 072e1050 00000044 0013363c 00080838 067e15d8 00133640 00fafc48 7c829f59 ntdll!RtlFreeHeap+0x70f 00133644 00fafa34 00000000 00133648 00000000 0013364c faedcf5c 00133650 11d131fe 00133654 8000d2aa 00000000 00133658 0e27c15f 0013365c 00000030 00133660 07a2abb0 00133664 4007b10e 00133668 66997ba0 0013366c 50f338ee 00133670 00003108 00133674 00000005 00133678 00000000 0013367c 00000000 00133680 ffffffff 00133684 01a1db7f 00133688 00000000 0013368c 00000000 00133690 00000002 00133694 10010000 00000008 00133698 00000a4c 0013369c 000003f4 Here is our PID: 00133670 00003108Anonymous
July 11, 2008
Actually my previous example is a bit different where we have RPC over LPC and !lpc command can do this. In your case I don't see the same simple way.Anonymous
July 20, 2008
>> hotfix 951738 to address this issue. The official KB article is still pending. << Looks like the KB article is now available: "On a computer that is running Windows Vista or Windows Server 2008 and that has UAC and Internet Explorer 7 protected mode enabled, Internet Explorer 7 may stop responding when you try to visit any Web site" @ http://support.microsoft.com/kb/951738Anonymous
September 12, 2008
How can you know the 003d2ef0 is pointing to another data structure? You have the SendReceive codes and the data structures? Also, how can you know "the thread in the ieuser.exe process working on our call from Iexplore.exe" from the call stack? For us without the windows source codes, it is good if you can explain more about the function and the structure. Anyway, you still tell us some undocumented way to check the PID and what some function parameters are. Thanks [Please see our response here. Thanks.]Anonymous
January 16, 2014
Used this today on a user dump and found the PID of the other process handling the request! Saved our customer from collecting a full system dump. --Kappa.