What's wrong with this code, part lucky 13
Today's example is a smidge long, I've stripped out everything I can possibly imagine stripping out to reduce size.
This is a very real world example that we recently hit - only the names have been changed to protect the innocent.
I've used the built-in C++ decorations for interfaces, but that was just to get this stuff to compile in a single source file, it's not related to the bug.
extern CLSID CLSID_FooDerived;[ object, uuid("0A0DDEDC-C422-4BB3-9869-4FED020B66C5"), ] __interface IFooBase : IUnknown{ HRESULT FooBase();};class CFooBase: public IFooBase{ LONG _refCount; virtual ~CFooBase() { ASSERT(_refCount == 0); };public: CFooBase() : _refCount(1) {}; virtual HRESULT STDMETHODCALLTYPE QueryInterface(const IID& iid, void** ppUnk) { HRESULT hr=S_OK; *ppUnk = NULL; if (iid == IID_FooBase) { AddRef(); *ppUnk = reinterpret_cast<void *>(this); } else if (iid == IID_IUnknown) { AddRef(); *ppUnk = reinterpret_cast<void *>(this); } else { hr = E_NOINTERFACE; } return hr; } virtual ULONG STDMETHODCALLTYPE AddRef(void) { return InterlockedIncrement(&_refCount); } virtual ULONG STDMETHODCALLTYPE Release(void) { LONG refCount; refCount = InterlockedDecrement(&_refCount); if (refCount == 0) { delete this; } return refCount;`` } STDMETHOD(FooBase)(void);};class ATL_NO_VTABLE CFooDerived : public CComObjectRootEx<CComMultiThreadModel>, public CComCoClass<CFooDerived, &CLSID_FooDerived>, public CFooBase{ virtual ~CFooDerived(); public: CFooDerived(); DECLARE_NO_REGISTRY() BEGIN_COM_MAP(CFooDerived) COM_INTERFACE_ENTRY(IFooBase) END_COM_MAP() DECLARE_PROTECT_FINAL_CONSTRUCT()};OBJECT_ENTRY_AUTO(CLSID_FooDerived, CFooDerived)
As always, tomorrow I'll post the answers along with kudos and mea culpas.
Edit: Fixed missing return value in Release() - without it it doesn't compile. Also added the addrefs - my stupid mistake. mirobin gets major props for those ones.
Comments
- Anonymous
June 29, 2005
Your casts in the QI are broken. They work so long as IUnknown : IFooBase are at the beginning of the vtable, but ... I wouldn't want to be the one depending on a pointer from that code.
The casts should be something more along the lines of "(void *)(reinterpret_cast<IFooBase *>(this))" and "(void *)reinterpret_cast<IUnknown *>(this);
You are also not properly incrementing the refcount on the object before you return it. - Anonymous
June 29, 2005
Isn't there going to be a problem with doing this:
> *ppUnk = reinterpret_cast<void *>(this);
for a class which is in a multiple-inheritance tree?
I think you really want:
> ppUnk = static_cast<IFooBase>(this); - Anonymous
June 29, 2005
Also, you should technically be checking ppUnk for NULL before assinging to *ppUnk... (but I doubt this was the answer you were seeking in relation to the core problem). - Anonymous
June 29, 2005
Oh - another problem. According to MSDN the InterlockedIncrement() in the AddRef() isn't going to do what you want on Win95, WinNT3.51 and earlier. - Anonymous
June 29, 2005
mirobin: I'm pretty sure you want static_cast<>(), not reinterpret_cast<>().
reinterpret_cast<>() is generally considered 'evil'. (Or at least should be) - Anonymous
June 29, 2005
The comment has been removed - Anonymous
June 29, 2005
I think Aaron hit it. With reinterpret_cast you are always going to be casting to ATL's IUnknown impl (which will happen to occupy that slot because of the way ATL is structured), so your IUnknown impl will never be called. - Anonymous
June 29, 2005
Aaron: reinterpret_cast<> can certainly let you do some fun & evil things, but in this case its usage should be semantically the same as static_cast<>. Granted, I've never really pinned down any functional distinction between the two casts other than errors thrown by the compiler when compiling ...
Lonnie: if that is indeed happening in this case, I believe you'll see the same behavior with static_cast<>.
You are probably correct about the initial refcount, though as the factory code isn't being shown here we can't really tell for certain if its a problem. - Anonymous
June 29, 2005
Lonnie, the reinterpret_cast isn't the problem (although mirobin's comment in the beginning is also valid - for this case it's not a problem, but in general it is a good idea (especially if you're dealing with multiple inheretance)).
But you're on the right track...
Aaron, I'm only concerned with operating systems that are currently supported (which doesn't include any of the Win9x operating systems). - Anonymous
June 29, 2005
Re: reinterpret_cast<>() vs static_cast<>()
In general, for simple types (non-multiple inheritance) reinterpret_cast<>() and static_cast<>() are indeed semantically the same. However, for complex types (which we're dealing with here) the difference is that:
reinterpret_cast<A*>(b)
is the same as:
((A*)(void*)b)
which throws away any compiler known type information. static_cast<>(), however, may need to offset the pointer when doing the cast because of the difference in vtbl and local variable offsets between the various types.
Looking back at the code, however, I think that in this case you're right - since the reinterpret_cast<>() occurs at a class which has single inheritance there will be no difference.
However - that makes them the same as reinterpret_cast<void*>(this).
Maybe the problem is more with the ATL code? I've never used ATL, but according to the spec BEGIN_COM_MAP() is going to overload InternalQueryInterface(). Does the base class' QueryInterface() ever get called or do we lose the IUnknown query ability? - Anonymous
June 29, 2005
So a quick look at MSDN seems to indicate that we want the com map to actually be something like:
BEGIN_COM_MAP(CFooDerived)
COM_INTERFACE_ENTRY(IFooBase)
COM_INTERFACE_ENTRY_CHAIN(IFooBase)
END_COM_MAP()
or we lose the handling of interfaces in IFooBase::QueryInterface(). - Anonymous
June 29, 2005
Aaron, that's a good idea (I'd not even realized that the COM_INTERFACE_ENTRY_CHAIN macro existed - that's cool) but you'll note that IFooBase doesn't have a COM map.
You're on the right path with your previous comment, the problem is specific to ATL. It's not the BEGIN_COM_MAP that's the issue though. It's somewhere else in the sample. - Anonymous
June 29, 2005
The comment has been removed - Anonymous
June 29, 2005
The comment has been removed - Anonymous
June 29, 2005
The comment has been removed - Anonymous
June 29, 2005
Hm. Most of this conversation is beyond my current C++ level, however just reading msdn, does the DELCARE_PROTECT_FINAL_CONSTRUCT macro get rendered useless by your implementation of Release() ? - Anonymous
June 29, 2005
Three things:
1. Multiple reference counts (as said by others)
2. FooBase() isn't implemented :)
3. ~FooBase() is private! I'm surprised that this compiles - a private destructor should prevent inheritance, invocation of delete as well as creation of objects on the stack...
4. I think the problem is the ASSERT in the destructor: It will always fire due to #1 and the fact that ATL does some funky things with the vtables.
If that ain't it I won't find it... - Anonymous
June 29, 2005
The comment has been removed - Anonymous
June 29, 2005
I'll go with what Mike said, the ctor will set _refcount to 1. But the end result of that will not be the desctructos ASSERT triggering but the fact that Release will never release the object, as long as it's called as many times it should be. It should be called as many times as AddRef is resulting in _refcount only decreasing to 1, never to 0. - Anonymous
June 29, 2005
I haven't read thru all the other comments (yeah I'm lazy, Me.Sue()) but the _refcount is way suspicious.
More so when you consider the usual way of instantiating an ATL COM object:
~
CComObject<CFooDerived>* pObj;
HRESULT hr;
hr = CComObject<CFooDerived>::CreateInstance(&pObj);
if (FAILED(hr)) bail_out;
pObj->AddRef();
~
Uh oh, what does that AddRef() do? _refcount is now 2. Bad times.
The caller using CFooDerived shouldn't be bothered by impl details in CFooBase, yet here it is.
Also, doesn't your QI() have to cast 'this' to IUnknown* to ensure the caller sees the right vtbl? (Not positive on this one, this is one of those black-magic areas of C++ that I haven't completely groked yet.) - Anonymous
June 29, 2005
Unlike MFC (old MFC, I haven't looked at it in a while), ATL assumes that a newly created object's refcount starts at zero so the initial QI for the request yields a refcount of 1. It isn't a problem with either piece of code, it is a problem of the combination of the two. - Anonymous
June 29, 2005
Mike Dunn is 80% of the way to figuring out the problem. You're all right, that the issue is in the destructor (the assert firing).
But the question is: Why?
Cogetate on Mike's code snippet, you should see what's happening from the snippet. - Anonymous
June 29, 2005
The docs here probably have the "key" to the problem:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_wceatl4/html/ealrfCComObject.asp
"If your object is not aggregated, IUnknown is implemented by CComObject or CComPolyObject. In this case, calls to QueryInterface, AddRef, and Release are delegated to InternalQueryInterface, InternalAddRef, and InternalRelease of CComObjectRootEx to perform the actual operations."
So we know that when we Addref/release the the CComObject<> doodad, we're manipulating the ATL internal refcount. If that ATL refcount hits zero, the object gets deleted while the CFooBase refcount in non-zero, firing the assert. - Anonymous
June 29, 2005
I was just doing some other reading on the actual implementation of CComObject ... the class is defined as something like:
template <class Base>
class CComObject : Base
{
//...
}
Meaning that it is derrived from CFooDerived, which means that CComObjectBase's implementation of IUnknown "wins", explaining why the base class's version of IUnknown isn't called... - Anonymous
June 29, 2005
BingBingBingBingBing!
Say the magic word and you win a prize!
mirobin nailed it. An object derived from CComObjectRoot doesn't have an addref/release/QI implementation, instead those are provided by the CComObject<subobject> wrapper object.
So the base implementation of IUnknown is totally dead code. And thus the addref/release behavior is ignored. - Anonymous
June 29, 2005
Oh, but I forgot one more thing...
Where's the code that instantiates the CComObject?
It's NOT left out. - Anonymous
June 29, 2005
OBJECT_ENTRY_AUTO - References the class factory in the CComCoClass template. The CreateInstance method does the work of creating the CComObject for you. - Anonymous
June 29, 2005
BTW, you are a tricky *****. :)
However, I see this type of bug all the time. Many times when I help people debug problems, it comes down to "Is your routine even being called?" - Anonymous
June 30, 2005
In other words, why using frameworks is a Bad Thing. You create code with 1 bug in it instead of 21 bugs, but that 1 bug takes 50 times as long to identify. - Anonymous
June 30, 2005
Who said this bug took 50 times as long? A quick run of the debugger should have tracked down the bug in a few runs. Also, with the assert, the bug should manifest quickly. - Anonymous
June 30, 2005
I'm bothered by the fact that the destructors for both CFooBase and CFooDerived are virtual and private. Does that even make sense in C++?
Ignore the other bugs for a moment and assume that CFooBase::Release executes delete this. The destructor is virtual, so you'd expect the derived class's d'tor to be called and then the base class's. But since the base class's d'tor is private, the derived class can't provide an implementation. So I'd expect the delete this in CFooBase::Release to call the CFooBase d'tor, which will lead to bugs if anything in CFooDerived's d'tor is important.
And is it a contradiction that CFooDerived has a virtual function but is declared with ATL_NO_VTABLE? What does that even mean?
ATL seems like a great example of Joel Spolsky's Law of Leaky Abstractions. http://www.joelonsoftware.com/articles/LeakyAbstractions.html - Anonymous
June 30, 2005
I loved this particular "What's wrong" because it was quite subtle (and real world).
One of the developers... - Anonymous
June 30, 2005
Making a destructor virtual in C++ is essential. If don't, the proper destructors will not get called while deleting an object (think about how inheritance works in C++ and this should make sense).
Making the destructor private is a good idea with refcounted classes, as it prevents code from manually deleting objects. As the destructor is virtual, you end up hitting the correct destructor regardless of where the delete is called from.
ATL_NO_VTABLE tells the compiler not to write code in the constructor/destructor to create/destroy the class's vtable. ATL generates the vtable at some point instead. - Anonymous
June 30, 2005
mirobin: I understand why you use virtual on a d'tor and why you make some base class methods private. My point was the combination of them on the d'tor is nonsensical. When you say virtual, you're explicitly giving derived classes the ability to override the method. When you make a method private, you're hiding it from derived classes. This is a contradiction.
And apparently MS VC++ .NET 2003 agrees with me. When I try to compile the example, it complains on the declaration of CFooDerived::~CFooDerived.
error C2248: 'CFooBase::~CFooBase' : cannot access private member declared in class 'CFooBase'
Thanks for the clarification on the ATL_NO_VTABLE. I always thought that it meant the object won't have a vtable at all, as that's what it said in an ATL tutorial I read. - Anonymous
July 01, 2005
Offtopic, I never got along well with the suicidal nature of "delete this" in COM...