CRT Startup
In my previous blog Early Debugging, we've demonstrated how early can you get using a user mode debugger.
Normally we don't want to be such early, there are some other places we would want to start with:
- OEP (Original Entry Point) of the EXE module. WinDBG has a predefined Pseudo-Register called $exentry which makes it a lot easier, as we already mentioned previously in Data Breakpoint.
- The startup or initialization of runtime. I've covered the managed runtime startup in Yet Another Hello World.
Now let's talk a bit about the native C/C++ Runtime. When you start writing applications using C/C++ on Windows, normally you would be using CRT already, unless you explicitly tell the linker not to use it, like what I did in A Debugging Approach to IFEO.
The CRT (C Runtime Library) comes with Windows and Visual C++ Redistributable (let's not talk about the special version which serves CLR), also you can link a static version into your EXE/DLL.
CRT provides the fundamental C++ runtime support, some obvious features are:
- setup the C++ exception model
- making sure the constructor of global variables get called before entering main function
- parse command line arguments, and call the main function
- initialize the heap
- setup the atexit chain
Let's get to the code:
/* crtexport.cpp */
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
class CFoobar
{
public:
CFoobar()
{
OutputDebugString(TEXT("CFoobar::CFoobar()\n"));
}
~CFoobar()
{
OutputDebugString(TEXT("CFoobar::~CFoobar()\n"));
}
};
CFoobar g_foobar;
__declspec(dllexport)
BOOL WINAPI Foobar()
{
return TRUE;
}
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvContext)
{
switch(fdwReason)
{
case DLL_PROCESS_ATTACH:
OutputDebugString(TEXT("DLL_PROCESS_ATTACH\n"));
break;
case DLL_PROCESS_DETACH:
OutputDebugString(TEXT("DLL_PROCESS_DETACH\n"));
break;
case DLL_THREAD_ATTACH:
OutputDebugString(TEXT("DLL_THREAD_ATTACH\n"));
break;
case DLL_THREAD_DETACH:
OutputDebugString(TEXT("DLL_THREAD_DETACH\n"));
break;
default:
DebugBreak();
}
return TRUE;
}
Note: don't put DebugBreak inside DLL entry point as I do, unless you understand that the loader lock would make JIT debugger unhappy.
/* crtimport.cpp */
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
BOOL WINAPI Foobar();
int main()
{
Foobar();
return 0;
}
cl.exe /LD /Zi crtexport.cpp
cl.exe /Zi crtimport.cpp crtexport.lib
Set two breakpoints, one at DllMain and one at the main function, then launch the application in Visual Studio Debugger:
Since our DLL is statically imported, the entry point of DLL is executed before the entry point of EXE.
As you might have noticed, the actual OEP is _DllMainCRTStartup. You can double click on the crtexport.dll!_DllMainCRTStartup frame and bring up the CRT startup code to start reading - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\dllcrt0.c.
Also, by taking a look at the Output window, we can see that CFoobar::CFoobar() has already been called, which means the global object was initialized before entering our DllMain. This is of course done by the CRT initialization code in __DllMainCRTStartup, which understands the contract between compiler and runtime.
Now you understand how the constructor of global variables gets called, think about the destructor semantic:
- Is it possible that global variable got destructed in a different thread?
- What if there is an exception thrown from the global variable constructor/destructor invocation?
The actual OEP for the EXE is __tmainCRTStartup. You can double click on the crtimport.exe!__tmainCRTStartup frame and take a look at the code - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\crt0.c.
As we mentioned in The Main Thread Problem, __tmainCRTStartup runs in the "main thread" , and would kill all the other threads before it is going to destroy the global variables. One thing to mention is that CRT makes use of _endthreadex instead of calling ExitThread directly, since _endthreadex would destruct objects constructed on the stack and free the related TLS data, while ExitThread knows nothing about the _tiddata block.
A few more questions:
- What if different versions of CRT are loaded into a single process?
- mixing debug and release version of CRT
- mixing static and dynamic version of CRT
- mixing different major version of CRT
- What would happen if there is an exception thrown across module boundary (e.g. from a DLL function to the caller which belongs to EXE)?
- Can I use CRT functions without initializing CRT?