The NT DLL Loader: DLL_PROCESS_ATTACH reentrancy - step 3 - quality requirements

Now we're loaded for bear!  We understand how PEs which are either launched via CreateProcess() or loaded via LoadLibrary() are the roots of directed cyclic graphs.  Each new graph is turned into a linear initialization-order list where nodes further from the root are initialized prior to nodes closer to the root.  Cycles in the graph are resolved based on where you first enter the cycle and thus depend on the entire graph (not just local DLL-to-DLL relationships).  Dynamic loads during initialization are often handled correctly but they themselves can introduce additional cycles.  There is less opportunity to fix this up since the dynamic load results only in new additions to the initialization list.

Great.  Seven impossible things already done (I guess they weren't really impossible, but then again maybe they weren't really done either, eh?) let's see where things start to get really messy.

Dynamic loads during initialization lead to a couple of very interesting things.  First, recall that the initializer that was in progress isn't re-run when GetProcAddress() is called.  That's going to be important.

Let's go back to my sleazy little attempt to call a function here:  (let's assume this is in bar.dll; it's not going to be very important but given all the players it's going to get confusing...)

    case DLL_PROCESS_ATTACH:
SOME_FUNCTION_PTR_T pfn = (SOME_FUNCTION_PTR_T) GetProcAddress(LoadLibraryW(L"SomeOther.DLL"), "SomeFunction");
if (pfn != NULL) (*pfn)();
break;
}

We didn't check the return status from the LoadLibrary() call.  Since LL() doesn't run initializers let's assume that that's OK in this context.  I'll assume that GetProcAddress() fails with an invalid argument and thus still returns NULL.  We do check the result of GPA() and don't call through the pointer if it's NULL so hey, nobody got hurt, right?

The LL() non-check is dubious.  Do you know why LoadLibrary() failed?  If it failed because of out-of-memory maybe it's the wrong thing to press on.  I digress; this is more of a usual topic for my blog rather than focussing on loader related issues.  But it's about to become a loader related issue.

Let's say that foo.dll was already on the init list, after the current DLL.  Let's assume that its refcount was 1.  Now SomeOther.DLL maybe statically imported foo.dll also.  Now it's (foo.dll's) refcount is 2 after the LoadLibrary() call.  Let's write the overall initialization list now:

ntdll.dll
kernel32.dll
bar.dll
foo.dll
somedllimportedbytheexethatusesfoo.dll
someother.dll

The GetProcAddress() call attempts to run foo.dll's initializer (since it has to be initialized before running someother.dll's initializer).  Let's assume it fails.  These things happen, it's the real world out there.  I/Os fail, memory allocations fail, duplicate file names occur, network glitches when trying to open file handles using the redirector, etc.

So, foo fails initialization and properly reports FALSE back to the loader.  The loader will propagate this failure out to the GetProcAddress() call.  But wait, that darned code assumes that the only reason GetProcAddress() can fail is due to ERROR_PROC_NOT_FOUND!

Care to guess what happens next?

Comments

  • Anonymous
    June 24, 2005
    The comment has been removed
  • Anonymous
    June 24, 2005
    Dan,

    What wll the DLL_PROCESS_ATTACH routine return for the initializer for bar.dll?
  • Anonymous
    June 25, 2005
    The comment has been removed
  • Anonymous
    June 26, 2005
    My guess is if the call to GPA returns NULL the calling code should be checking GetLastError() to see the reason for the failure.

    It is entirely possible (I assume) for GPA to fail for reasons that don't warrant SomeOther.dll propagating a FALSE return to the loader out of its initialization section. On-the-other-hand there are some circumstances where SomeOther.dll MUST fail in order accurately reflect the state of its dependencies.
  • Anonymous
    June 28, 2005
    mgrier,
    I would need just one thing - could LH's user32.dll in it's DllMain call GPA in order to init imm32.dll it loaded?

    Description:
    User32.dll's DllMain in Longhorn loads imm32.dll. Imm32.dll's DllMain then fills imm entries (pointers to imm functions) in user32's data. After loading imm32.dll, user32.dll's DllMain calls one of those imm entries. If the entry (ImmRegisterClient)returns FALSE, user32's init is stopped and DllMain returns FALSE too (user32 remains semi-inited).

    When user32 is "preloaded" (loaded when LdrpLdrDatabaseIsSetup == FALSE; before executing DllMains of modules statically linked to main module), imm32 gets loaded but it's DllMain is not called (at this place I would need user32 to init imm32 via GPA). User32's DllMain then goes on and calls entry for ImmRegisterClient. Entry for ImmRegisterClient is supposed to be filled by imm32's DllMain; originally it points to simple stub that returns TRUE in LH 5048 but FALSE in 5082. That's why in LH 5048 user32's DllMain goes on but on LH5082 it returns FALSE immediatelly.

    [Of course, when user32.dll is loaded when LdrpLdrDatabaseIsSetup == TRUE; when/after executing DllMains of modules statically linked to main module), imm32 gets loaded, it's DllMain is called and fills imm entries.]

    ==================
    Second problem (any NT): when there's nothing to init, TLS callbacks of main module are not inited too (called with process_attach). You can see it in LdrpRunInitializeRoutines. Example: - create .exe that has tls callbacks and imports from kernel32 only) and run it on XP+.

    Thank you,
    Radim Picha
  • Anonymous
    June 28, 2005
    Radim, I'll contact you directly about the user32 issue. I'm unaware of any cases where user32 is loaded that early so I'd like to understand. In general the pattern you're using is dangerous, but in practice it's almost impossible to dynamically load user32 so barring cycles, it typically works.

    RE: TLS:

    loader-based TLS has never worked until recently for dynamically loaded PEs since the whole TLS block has to be allocated at the time of thread initialization. There was a checkin to enable this recently for the dynamic cases so you might be able to depend on this feature in LH. In practice, due to the general loader reentrancy issues, you're better off managing your TLS manually anyways.
  • Anonymous
    July 11, 2005
    The comment has been removed