Identify deadlock through the hwnd handler
As we know, there are some good articles discussing about deadlock detection in action:
Advanced Techniques To Avoid And Detect Deadlocks In .NET Apps
https://msdn.microsoft.com/zh-cn/magazine/cc163618(en-us).aspx
contextSwitchDeadlock
https://msdn.microsoft.com/en-us/library/ms172233.aspx
There are also some automatic tools can help us to detect deadlock with postmortem dump files, especially for famous DebugDiag 1.1:
However, due to the complicated resource locking symptom, there is no a common way to identify all deadlock pattern so far. Here is one interesting sample about deadlock happens between hwnd and event handler.
Symptom
==========
One customer uses a new thread to create COM object. But always experience no responding on object creating. I collected the memory dump while issue happens. There is no any critical section lock. Several threads seems run under tasks, but in waiting status.
Analysis
=========
Thread 13 call stack is interesting:
0:013> kbnL
# ChildEBP RetAddr Args to Child
00 02b8f0d8 7739d1ec 77391908 003a0428 00000405 ntdll!KiFastSystemCallRet
01 02b8f114 7739c337 00b11728 00000405 0000babe user32!NtUserMessageCall+0xc
02 02b8f134 776f58ee 003a0428 00000405 0000babe user32!SendMessageW+0x7f
03 02b8f168 77733822 001918c0 02b8f218 02b8f188 ole32!CDllHost::GetApartmentToken+0x203
04 02b8f178 776f67b8 02b8f218 02b8f390 02b8f19c ole32!DoSTApartmentCreate+0x12
05 02b8f188 776acd3c 00000000 00000002 02b8f218 ole32!CClassCache::GetActivatorFromDllHost+0xa3
06 02b8f19c 776accf1 02b8f1b8 02b8f390 02b8f218 ole32!CClassCache::GetOrCreateApartment+0x20
07 02b8f1f0 776acc78 0019dd5c 00000000 02b8f390 ole32!FindOrCreateApartment+0x46
08 02b8f22c 776ad907 77794960 02b8f5b0 02b8f244 ole32!CProcessActivator::GetApartmentActivator+0xc7
09 02b8f248 776acb27 77794960 00000001 00000000 ole32!CProcessActivator::CCICallback+0x17
0a 02b8f268 776acad8 77794960 02b8f5b0 00000000 ole32!CProcessActivator::AttemptActivation+0x2c
0b 02b8f2a4 776ada17 77794960 02b8f5b0 00000000 ole32!CProcessActivator::ActivateByContext+0x4f
0c 02b8f2cc 776aaf7e 77794960 00000000 02b8f754 ole32!CProcessActivator::CreateInstance+0x49
0d 02b8f30c 776aaf19 02b8f754 00000000 02b8fc9c ole32!ActivationPropertiesIn::DelegateCreateInstance+0xf7
0e 02b8f33c 776aaf7e 7779487c 00000000 02b8f754 ole32!CClientContextActivator::CreateInstance+0x8f
0f 02b8f37c 776ab10f 02b8f754 00000000 02b8fc9c ole32!ActivationPropertiesIn::DelegateCreateInstance+0xf7
10 02b8fd50 776a679a 02b8fe20 00000000 00000017 ole32!ICoCreateInstanceEx+0x3f8
11 02b8fd84 776a6762 02b8fe20 00000000 00000000 ole32!CComActivator::DoCreateInstance+0x6a
12 02b8fda8 776a6963 02b8fe20 00000000 00000017 ole32!CoCreateInstanceEx+0x23
13 02b8fdd8 7825037e 02b8fe20 00000000 00000017 ole32!CoCreateInstance+0x3c
Look at frame 2, SendMessageW tried to send message to a window. Check the function definition, the first parameter is hwnd: 003a0428
Then I go through other threads to see which one owns the hwnd 003a0428, by checking the ole data stored in each COM thread, found it is thread 9 actually:
$t0=00000009
+0x074 hwndSTA : 0x003a0428 HWND__
0:009> kbL
ChildEBP RetAddr Args to Child
0207fe88 7c827cfb 77e6202c 00000001 0207fed8 ntdll!KiFastSystemCallRet
0207fe8c 77e6202c 00000001 0207fed8 00000000 ntdll!ZwWaitForMultipleObjects+0xc
0207ff34 77e62fbe 00000001 02601920 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0207ff50 00423f08 00000001 02601920 00000001 kernel32!WaitForMultipleObjects+0x18
0207ffb0 0042413e 00000000 77e64829 015dcf20 MyModule!Run+0x3f8
0207ffb8 77e64829 015dcf20 00000000 00000000 MyModule! ThreadFunc+0x2e
0207ffec 00000000 00424110 015dcf20 00000000 kernel32!BaseThreadStart+0x34
Check thread 9 in detail, WaitForMultipleObjectsEx only waits on one handle 0x00000314:
0:009> dc 0207fed8 L1
0207fed8 00000314
And the 314 is actually a thread event handler of thread 13:
0:009> !handle 0x00000314 f
Handle 00000314
Type Thread
….
Thread Id d80.b04
Priority 9
Base Priority 0
0:009> ~13
13 Id: d80.b04 Suspend: 1 Teb: 7ffad000 Unfrozen
Now the deadlock graph is clear. Thread 13 is pending on Thread 9 to pick up message on a whindow which handle is 003a0428, but thread 9 is waiting on Thread 13 complete its task.
Solution
==========
The deadlock is unusual, who made it happen and how to Resolve?
Review thread 13 again, at frame 3, the function was called:
ole32!DoSTApartmentCreate
Actually this function will only be called when a Main STA object is created. That’s why thread 13 stuck on thread 9, which is the main STA thread.
Ask customer to change the thread type for this object in registry, from Main STA to Apartment STA. Open Registry Editor, check the key for the IND_COMMON.clsExecProc compoenent:
HKCR\CLSID\ {<Object class ID>}\InprocServer32
Make sure the "ThreadingModel" value is "Apartment"
Summary
===========
As a summary, we learn:
1. Deadlock pattern is quite various, this can happen to allocating different types of resources. In this sample, one is STA window, another is Thread event. We must identify which resource is occupied carefully during debugging process.
2. We cannot use main STA object in COM/COM+ environment, it is easily cause performance or locking issue.
More information
==================
About COM thread modes:
150777 INFO: Descriptions and Workings of OLE Threading Models
https://support.microsoft.com/default.aspx?scid=kb;EN-US;150777
Regards,
By Freist Li
Comments
- Anonymous
September 15, 2015
Thank you very much~