Share via


Performance: One big assembly vs several small assemblies.

I frequently see people ask the same question in microsoft.public.dotnet.framework.clr and microsoft.public.dotnet.framework.performance. Which one has better performance, one big assembly or several small assemblies?

Strictly from a performance point of view, one assembly is always better than several assemblies. Each assembly loading has a fixed overhead. For multiple assemblies, you pay the overhead several times.

The overhead of assembly loading at minimum includes the following:
1. Finding the assembly.
2. Loader in memory data structure tracking this assembly.
3. Assembly initialization.

1. Overhead of finding the assembly.
For .Net assembly, the probing rule is documented here. For strongly named assembly, we will apply policy, then probe in GAC, then in app directory. Applying policy means finding config files and parse them (This is once per AppDomain so the overhead is not always as big as that. Nonetheless we will always look for publisher policy. This cost is always there. ). If your assembly is not in GAC, probing in GAC is a waste for you. All of these means a lot disk access. For simply named assembly, we don't apply policy, and we don't probe GAC so the overhead is smaller. But the overhead is still there.

2. Loader overhead.
Loader always has some overhead for each assembly. Like looking up the appdomain to see if the assembly is loaded or not, registering the assembly in the appdomain. All these has time and memory overhead. For .Net assembly, you pay the overhead three times for each assembly, one in fusion, one in CLR loader, one in OS loader.

3. Assembly initialization.
Every assembly has some initialization cost. If you look at .Net assembly's import table, it has an entry pointing to mscoree!_CorDllMain. This method is executed every time an assembly is loaded. If your assembly is a Manager C++ assembly, it has its own DllMain to execute. It may also need runtime fixup. Also if you use C/C++ runtime library, it has its own initialization.

The cost of above add up when you have multiple assemblies.

There are other cost associated with multiple assemblies. Each assembly has its own metadata. This is extra disk size cost. And one assembly is likely to have better disk sequential distribution than several assemblies. This means the disk access time for several assemblies is going to be longer.

Of course there are many many good reasons why you want multiple assemblies. But from a strict performance point of view, one assembly wins, always.

Comments