Jaa


How the NT Loader works

My team maintained the NT loader (the component that loads DLLs) for about a year or so during Windows XP as we were adding the isolated application features to it so we got quite an interesting perspective on this lovely little piece of technology.  Warning to people who find themselves wanting to innovate in technology which has basically been left dormant and untouched for over a decade: be sure you have plenty of time to deal with the anthills you knock over!

We don't own it any more (not sure if it's a blessing or a curse...) but it sure was interesting and enlightening; especially in the tradeoffs of application compatibility, robustness and reliability.

You might notice that the docs for DllMain have grown a lot over the past few years.  I like to think that my team's involvement here had a lot to do with it because DLL load order etc. was always a vaguely understood and arcane topic.  There were always vague warnings about not doing too much in DLL_PROCESS_ATTACH but nobody could really describe the situation except for a number of anecdotes they had had in the past when somehow mysteriously load orders changed and they were broken during either initialization or shutdown.

I'll take a break from where I'm headed on the reliability front and walk through a summary of the issues which I recently sent to the internal win32 programming email alias.  Hopefully I'll fix the incomplete sentances and bad grammar this time.

I'll make a separate post with the beginning - a basic rundown of how things work today.  As usual, do not consider this in any way shape or form a contract.  One of the reasons that this isn't documented fully is that people have wanted to change/fix it for years and years now.  On the other hand, maintaining compatibility with the current behavior is going to constrain the implementation so much that either (a) it won't change after all or (b) the change will have to be compatible with the effects of anything I'm describing here anyways.

You will see aspects of my reliability/robustness series come up here.  You'll laugh, you'll cry, you'll see local innocuous bugs in DLL initialization or uninitialization affect the entire process's reliability.

Comments

  • Anonymous
    June 17, 2005
    The comment has been removed

  • Anonymous
    June 17, 2005
    Hard to say. It's possible but even the radical experimental new programming model (CLR) still needs a basic win32 operating environment set up underneath itself. Building something that doesn't depend on win32 might get rid of a bunch of process initialization costs but (a) the NT level APIs are undocumented for a reason and (b) it's not clear that we wouldn't just be trading the devil we know for the devil we don't know.

    I think we're on a good path to get rid of the cruft that's gotten into the initialization path over the next few releases and if we can get rid of most of the DLL initialization code in the world, DLL and process startups will be markedly faster.

  • Anonymous
    June 17, 2005
    Whilst the documentation for DllMain has grown a lot it still does not describe in detail the ramifications of the lock that is taken out (the process lock, I don't know the internal name for it).
    The two comments that there are ("It must not call the LoadLibrary or LoadLibraryEx function" and "entry-point functions should not attempt to communicate with other threads or processes") are not sufficient.
    Please can someone document all the places in which the process lock is taken out, so that we have a definitive list of areas that will cause problems?
    This should be documented, because any changes to the places where it is taken out will affect applications.

  • Anonymous
    June 17, 2005
    You will not see the details of the "loader lock" documented. However I do plan to discuss a number of the visible side effects of the current implementation.

  • Anonymous
    June 18, 2005
    I don't normally post on weekends, but I just noticed that Michael Grier's finally started posting his...

  • Anonymous
    June 18, 2005
    Why not?
    I'm not after details of how it works (well, actually I love to see that, but I don't consider that necessary) but the places where it is taken out affects the code that we can write.
    Wouldn't it make more sense to just document all the functions calls that might result in the lock being taken out and avoid the current approach of just highlighting certain of the things that will break?
    As it stands it forms part of the contract between you and us - but we don't know what it is.

    I'm not trying to get aggressive about it, but I've said why I think it should be documented and I'd like to know why it won't be.

  • Anonymous
    June 18, 2005
    The comment has been removed

  • Anonymous
    June 19, 2005
    I was going to suggest that warnings be issued (in the application log at least) when cycles are detected, so that developers will know that something will break in the future and they might look into breaking the cycles now in a designed non-random manner.

    But then, just by accident, I came across these two cycles:
    hal.dll -> ntoskrnl.exe -> hal.dll
    bootvid.dll -> ntoskrnl.exe -> bootvid.dll

    What does it mean for a dll to be dependent on an .exe? Or is ntoskrnl.exe really a shareable (oops) that missed out on getting its filename changed?

    What happens if the NT loader loads these dlls and exe in a different order than it used to? (No not that NT loader, that NT loader. Or are there more?)

    > Re: why not document the loader lock:
    > It's an implementation detail that some
    > folks believe that we will be able to
    > eliminate eventually.

    And why is it partially documented? And why did your team add more parts to the documentation? Because programmers have to know what to work around, right? Let the MSDN pages about elimination say "Preliminary information subject to change" and let MSDN pages about this decade's systems be more concrete about what this decade's programmers have to work around.

  • Anonymous
    June 19, 2005
    Cool! I guess I don't need to save that mail you sent after all.

    Suggest: spell checking.
    s/sentance/sentence/

    As always, this is great information and a fun read, too. Thanks.

  • Anonymous
    June 20, 2005
    Re: HAL, kernel, etc.:

    I don't know why ntoskrnl.exe is an exe. Geeks love to play these unified field theory games where if they generalize things enough, everything fits in.

    I also have very little perspective on the kernel mode loader. It's an entirely different chunk of code from the user mode loader. I won't postulate past this point. Note that at some point here, the boot loader also is important; I believe it has to load both the right hal and the kernel.

    Finally, on imports from PEs... this is an interesting topic. In the unified-field-theory of things, everything's a PE and you can import anything from any PE that exports it.

    That said, EXEs are different. Their entry points do not follow the DllMain shape. The linker defaults are to not include relocation information for themselves, so they can't be moved.

    Given this, the only usable way to use an export from an executable is if it was the executable that was used to launch the process (since then it didn't have to be relocated and its initialization had to have been done before it got to your code.

    The DLL loader has prevented loading of executables (it's a bit in the PE32 header) for ages. In XP, I added code so that importing exports from executables other than the base process executable failed. Again, since we had touched the code, anything that went wrong with process initialization or DLL loading/unloading was directed to us and we found a few cases where people were using static imports to try to load (new) executables into the process. It occasionally worked but often did not so we advised the appropriate folks to change their ways and just closed this down.

    The interesting cycles had more than 2 nodes.

    This topic is getting a lot of attention as we move towards a less ... organic ... process of growing Windows.

    See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnembedded/html/embedded04152003.asp for an interesting appetizer of the issues we're working in this area. The article isn't directly relevant but the footprint issue is all about dependencies and especially cyclic dependencies.

  • Anonymous
    January 21, 2009
    PingBack from http://www.keyongtech.com/2546617-how-to-diagnose-dll-unloading

  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=mgrier-s-weblog-how-the-nt-loader-works