Managed C++ or C++/CLI Double Thunking

On an internal discussion list, we're discussing the performance impact of virtual functions in managed C++. David Schwartz from VC++ User Education sent us a draft of a related documentation that will be in Whidbey (Visual Studio 2005). He has agreed to share it:

Double Thunking

Introduction

Double thunking refers to the loss of performance you can experience when a function call in a managed context calls a managed function and where program execution calls the function's native entry point in order to call the managed function. This topic discusses where double thunking occurs and how you can avoid it to improve performance.

Remarks

By default, when compiling with /clr (not /clr:pure), the definition of a managed function causes the compiler to generate a managed entry point and a native entry point. This allows the managed function to be called from native and managed call sites. However, when a native entry point exists, it can be the entry point for all calls to the function. If a calling function is managed, the native entry point will then call the managed entry point. In effect, two calls are required to invoke the function (hence, double thunking). For example, virtual functions are always called through a native entry point.

One resolution is to tell the compiler not to generate a native entry point for a managed function, that the function will only be called from a managed context, by using the __clrcall calling convention.

Similarly, if you export (dllexport, dllimport) a managed function, a native entry point is generated and any function that imports and calls that function will call through the native entry point. To avoid double thunking in this situation, do not use native export/import semantics; simply reference the metadata via #using (see The #using Directive).

In Visual C++ 2005 the compiler was updated to reduce unnecessary double thunking. For example, any function with a managed type in the signature (including return type) will implicitly be marked as __clrcall. For more information on double thunk elimination, see https://msdn.microsoft.com/msdnmag/issues/05/01/COptimizations/default.aspx.

Example

Description

The following sample demonstrates double thunking. When compiled native (without /clr), the call to the virtual function in main generates one call to T's copy constructor and one call to the destructor. Similar behavior is achieved when the virtual function is declared with /clr and __clrcall. However, when just compiled with /clr, the function call generates a call to the copy constructor but there is another call to the copy constructor due to the native-to-managed thunk.

Code

// double_thunking.cpp

// compile with: /clr

#include <stdio.h>

struct T {

T() {

puts(__FUNCSIG__);

}

T(const T&) {

puts(__FUNCSIG__);

}

~T() {

puts(__FUNCSIG__);

}

T& operator=(const T&) {

puts(__FUNCSIG__);

return *this;

}

};

struct S {

virtual void /* __clrcall */ f(T t) {};

} s;

int main() {

S* pS = &s;

T t;

printf("calling struct S\n");

pS->f(t);

printf("after calling struct S\n");

}

Sample Output

__thiscall T::T(void)

calling struct S

__thiscall T::T(const struct T &)

__thiscall T::T(const struct T &)

__thiscall T::~T(void)

__thiscall T::~T(void)

after calling struct S

__thiscall T::~T(void)

Example

Description

The previous sample demonstrated the existence of double thunking. This sample shows its effect. The for loop calls the virtual function and the program reports execution time. The slowest time is reported when the program is compiled with /clr. The fastest times are reported when compiling without /clr or if the virtual function is declared with __clrcall.

Code

// double_thunking_2.cpp

// compile with: /clr

#include <time.h>

#include <stdio.h>

#pragma unmanaged

struct T {

T() {}

T(const T&) {}

~T() {}

T& operator=(const T&) { return *this; }

};

struct S {

virtual void /* __clrcall */ f(T t) {};

} s;

int main() {

S* pS = &s;

T t;

clock_t start, finish;

double duration;

start = clock();

for ( int i = 0 ; i < 1000000 ; i++ )

pS->f(t);

finish = clock();

duration = (double)(finish - start) / (CLOCKS_PER_SEC);

printf( "%2.1f seconds\n", duration );

printf("after calling struct S\n");

}

Sample Output

4.2 seconds

after calling struct S

Comments

  • Anonymous
    June 30, 2005
    We are seeing a huge performance penalty in our test suites after recompiling with /CLR (40 times slower). We are assuming that the main culprit is Double Thunking since we are seeing a lot of them in manual tracing (they show up in the call stack). We have many DLL's and make extensive use of virtual function calls across DLL boundaries (which often have a call back up to the base method (quadruple thunk, anyone?).
    Are there any ways to trace or profile the number and/or cost of the thunks?
    Can you place the __clrcall around an entire class, rather than each method?
    Any other suggestions would be welcome. Thanks.
  • Anonymous
    November 14, 2005
    The comment has been removed
  • Anonymous
    July 17, 2006
    Yah! Can I use this __clrcall on Turbo Pascal functions too?