Partilhar via


ContextSwitchDeadlock MDA and COM

The ContextSwitchDeadlock MDA is a very annoying debugger message.  The message is reported by a background thread, which wakes up once a while and if it finds a remote call doesn't pass in 60 seconds, it raises the error.  But the problem is that the error message contains a few context code, but doesn't tell you the exactly location it happens, or tell you which thread makes the call.

Based on the document, the problem happens, when a thread (so often the main UI thread) is working on something and doesn't pump message for 60 seconds.  In that period, if a background thread tries to make a remote call to the thread, it could be blocked for more than 60 seconds, and the problem will be raised.  If the application uses COM, (maybe directly or indirectly -- for example one control is built this way), it could be a problem.  Even the application doesn't use multi thread directly, and we don't feel that we make any remote call.   The problem is that most managed application depends on the GC to release COM marshalling object,  and the GC is running in a background thread and it could run at any time.  So, when the GC tries to release a COM object when it finds no one is using that, it will make a call to the STA thread.  If the STA thread ever does something longer than 60 seconds, it becomes a problem.

It could be worse, if we create a background STA thread.  It is almost impossible to stop that thread, because if GC hasn't cleaned up all COM objects created in that STA thread at that point.  It would end up in the ContextSwitchDeadlock, because no way to call into a dead thread.  Of course, we couldn't control GC to clean up those objects before we stop the thread. Unless you can control the life of those ReleaseComObject, it is better to create MTA background thread instead.

ReleaseComObject itself could be a nightmare.  Unless you own the both side, the code could be broken easily if the COM object written in native code is updated to managed later.  The internal count is increased in any native-managed boundary, so we need know the exactly boundary, which is not really detectable in code.  'IsComObject' does not provide much value, because an object could be implemented in half managed and half native code.  For those objects, IsComObject will return true, although the call doesn't go through native/managed boundary.  Or even the half managed half native object is passed through a such boundary, its internal count never increases. The first time when you call ReleaseComObject, the native half of the object will be released, and the whole thing is now broken.

Comments