Qualitative Code Differences in Managed Code
My colleague Vance Morrison wrote an internal paper on code quality issues in our current system. I thought there were some excellent items discussed in his paper so with his kind permission I've edited/summarized it for a general audience. Thank you Vance.
Qualitative Code Differences in Managed/Unmanaged Code
If you were to compare the assembly code of an equivalent managed and unmanaged program, you would find the differences break down into three broad categories: Intrinsic Features, Optional Features, and JIT Compiler Limitations.
In contrast, things like local variable access, argument access, flow of control, method calls, instance field accesses, as well as all primitive arithmetic are largely unchanged in managed code. This is very nice since this is the heart of most performance-sensitive programs. So there’s a great base for raw computation problems. See Jan Gray’s paper “Writing Faster Managed Code: Know What Things Cost”
Intrinsic Runtime Features
These are the things that don’t exist in the unmanaged world, such as garbage collection (GC), appdomains, and code-access security. This is the most worrisome set of differences between managed and unmanaged code because you really can’t “opt-out” of these features – they represent the intrinsic cost of using the runtime.
- GC Information – To do a garbage collection, all pointers to the GC heap must be identified (and possibly updated). This includes all pointers on the execution stack (local variables, arguments, register spills) for every thread in the system, as well as any pointers in CPU registers themselves. This requires the JIT compiler to generate GC tracking information sufficient to walk the stack at (roughly) arbitrary times. This extra information is most fundamental difference between unmanaged and managed code. While GC tracking requirement does not affect code quality at all, it does mean that every method has a table associated with it that is typically 15% the size of the code (on x86). Luckily, this table is only accessed for methods active during a GC, so it generally has a small affect on working set. There is also a small working set overhead (~ 1 DWORD per method), to link the method and its GC information. The good news is that all of this has no effect at all on code quality and only a small effect on working set.
- Write Barriers – The runtime uses a “generational” GC which improves GC performance by only collecting part of the heap most of the time. To implement this every write of a GC pointer that resides in the GC heap needs to be logged as a potential root of a partial GC. This bookkeeping adds an additional 4-10 cycles for every such write in the common case, see “Garbage Collector Basics and Performance Hints.” Write barriers are a concern, but the overhead is not huge. A pointer write goes from about 1 cycle to on average of 6 or 7 cycles), for pointers on the GC heap – and the hottest pointers are typically on the stack where there is no penalty at all. The effect of write barriers is often measurable (a few percent or so), and can be more significant in certain tight loops.
- Static Field Access – The runtime supports a lightweight process-like environment called an AppDomain. Each AppDomain has its own copy of all static variables. Because of this, any domain nuetral code must use 5-10 instructions to access static fields of just 1. The JIT can optimize many cases (allowing one fetch of AppDomain variables to serve many static field fetches in the same method), but there are cases when no optimization can be done. Domain Nuetral code is more common in Whidbey. Static field access overhead is actually worse than write barriers in the worst case: the static field access goes from one cycle to roughly ten cycles. However because the overhead of field fetches can be combined (and pulled out of loops) the impact of slower field fetch is generally less than that of write barriers. It has no measurable impact at all in many scenarios (for instance the framework code tends to not use static much).
- Interop with existing unmanaged code – Transitions to unmanaged code minimally must be marked on the stack to allow garbage collections to happen correctly, and there can be security checks and/or argument conversion necessary (if the types don’t exactly match operating system type). In the best case (no security concerns, simplest kind of call) the overhead is 10-20 instructions. Costs can increase dramatically if argument conversion is needed. .
“Optional” Features
These are features that developers can avoid if they wish to, though for the most part we encourage developers to use them universally. (e.g. array bounds checks, run time casts). These features can be avoided in particular cases if needed (e.g. by using “unsafe” code).
Ease of use, safety, and simplicity are weighted heavily in making design decisions for most managed code users, including our framework, so most code takes advantage of these “optional” features as a matter of course. Where these costs are hard to bear because the code is highly performance critical you can opt-out if necessary. Opting out with due caution is our normal recommendation.
- Managed code strongly encourages code to be verifiably type safe (which means the CLR can prove all references are to instances of the statically declared type). This leads to a bunch of small overheads that can add up.
- Bounds checks on many array accesses (by default, every access has a length check at the cost 2 instructions). You can opt-out by using unsafe code.
- Type checks on every set to an array of objects to ensure that the value being set is compatible with the array being updated. You can opt-out by using unsafe code.
- Type checks when extracting data from type neutral containers and APIs. You can opt out by using unsafe code.
- Boxing (wrapping a primitive type in an GC heap object) when inserting primitive types into type neutral containers and APIs. You can opt out by using generics or generating a container for the specific primitive type.
- Non-mutable strings. The basic string type is not mutable, which often means more data copying (but sometimes less). You can opt out of this by manipulating character arrays or special classes like StringBuilder, but when you interface with APIs that expect strings, you need to make a copy.
- Delegates. Managed code has type-safe notion of a function pointer called a delegate. Delegates are more powerful then C function pointer because they carry state, and can dispatch to multiple targets. This increases overhead. You can opt out by using unsafe function pointers.
- The runtime has an extensive set of reflection APIs that allow code to introspect on the running code. It is relatively easy to probe for types at runtime, traverse inheritance hierarchies, set fields by string name, call methods by string name, and even generate new methods on the fly. These are powerful features (really not available at all in the unmanaged world), but have a significant cost compared the precompiled code. A careful engineering tradeoff has to be made by the users of these features to ensure the benefit of this introspection is worth it.
- Managed code tends to have more extensibility points than the equivalent unmanaged counterpart. Developers use object oriented techniques, using virtual functions, interfaces, and the reflection APIs to achieve this. These extensibility points can cost significant amounts of performance and have to be carefully weighed by framework designers.
- Managed code supports Custom Attributes on IL entities (Types, Methods, Fields etc.) This has been valuable for adding new features to the system (e.g. hosting, interop, security, or reliability information) but the attributes are relatively expensive to access at run time. This expense has to be factored into the cost of these new added features.
- Managed code tends to allocate more heap objects (i.e. more methods tend to return new objects rather than modify one that was passed in). Of course reusing objects in place can cut down on the allocation overhead, but, even more importantly, sometimes the locality benefits of nice compact allocations trumps other considerations, and of course managed allocations are more like the speed of a custom unmanaged allocator and not a raw malloc(). So allocation considerations are a subtle topic at best.
- Compilers can make expensive features very easy or even implicit (e.g. transitioning to unmanaged code, anonymous delegates) which magnifies their use tremendously.
- Managed libraries often do extensive precondition checking to give detailed errors on API misuse, for example checking for null object references and returning an ArgumentException. This is great for developers but hurts performance. Obviously this was a choice made by the library designers (end users can’t opt out, except by re-implementing, but library designers can).
JIT Compiler Limitations
The final category of code generation differences are artifacts of the current JIT compiler rather than inherent trade-offs in the managed system.
The current just in time (JIT) compiler is more limited than a typical commercial quality unmanaged compiler, partly because it needs to be smaller and faster and partly because it just isn’t as mature. Some of the larger issues include:
- 64 bit arithmetic – Since the initial implementation have invested heavily in 32 bit integer code quality, but not code quality for 64 bit integers.
- Inlining – The inlining subsystem could use additional work to handle larger inlining cases – this is getting more important as more complex properties become more common and require inlining for performance.
- Analysis caps – For the sake of speed the JIT places arbitrary caps on the size of analysis data. For large methods, the JIT does not have the information necessary to do a really good job.
- Value Types (structs) – Value types are not handled as well as reference types. For example the inliner does not inline function with value type parameters.
- Exception Handling – The code generated for exceptions is based on the assumption that exception handling is rare. This assumption is turning out to be false as users write code with increasingly rich exception semantics.
Comments
Anonymous
February 22, 2005
Exception Handling: They're doing this because you (microsoft as a whole) told them initially this was the new way and that it was practically free.
In reality, its expensive and this advice was poor.Anonymous
February 22, 2005
See http://weblogs.asp.net/ricom/archive/2003/12/19/44697.aspx for more details on exception handling guidelines.Anonymous
February 22, 2005
I personally believe that the lack of inlining for Value Types is the biggest issue in current JIT performance. I think that Value Types in general need much more aggressive inlining than reference types. If performance were not an issue, nobody would be using Value Types in the first place.
Special-casing of generics for Value Types is great (at the cost of a bigger working set), but the lack of inling makes it difficult/unworthwhile to create efficient lightweight wrappers for other primitive types that add additional features (like say a wrapper for Int32 that restricted its values, or implemented a generic interface such as IArithmetic<T> for doing math in generics). It also adds penalties when creating new primitive-like types (such as a Complex number).
Personally, I think that there should be really aggressive inlining on overloaded operators, property accessors, and constructors defined for Value Types, even if it increases the initial cost of JITing a Value Type a bit.
Most programmers do not "see" the added cost of calling an overloaded operator, and the runtime (or maybe even the C# compiler) should work as hard as possible to inline them.
Even System.Decimal would benefit from such changes to the JIT, and it is very frequently used in business applications.Anonymous
February 22, 2005
With regards interop with unmanaged code, I wonder if you'd consider writing a post on suggestions for performance regarding crossing the managed/unmanaged boundary? We have a slight performance issue with the following, and I'm sure it must be a common scenario (even if not exactly the same).
We started our application (in C#) pretty much as soon as .NET 1.0 appeared. Front-end and business objects are written entirely in C#. We reused an in-house object/relational system that sits on top of the database, which we upgraded from straight C++. In the middle is an auto-generated data-access layer, with a pair of object/collection classes per table, and a pair of get/set methods per column. Business objects are stateless and only contain a reference to the data-acess object. These d/a methods have the form:
// MC++
Type Table::GetField()
{
// Type is 'simple': one of int, bool, System::String *, DateTime, etc.
Type value = m_pRow->GetValue( <fieldindex> ); // m_pRow is unmanaged.
GC::KeepAlive( this );
return value;
}
So obviously the interface is too 'chatty' but that's hindsight and can't be easily changed. How can we know if we're making the cheapest possible call here (you mentioned argument conversion and security)?Anonymous
February 22, 2005
> The current just in time (JIT) compiler is more limited than a typical commercial quality unmanaged compiler, partly because it needs to be smaller and faster and partly because it just isn’t as mature. Some of the larger issues include.
I recently read about the improved NGEN in 2.0:
http://msdn.microsoft.com/msdnmag/issues/05/04/NGen/default.aspx
The article indicates that NGEN works by invoking the same underlying JIT compiler. But at install time, there isn't the time constraint. Why can't NGEN use a more agressive optimizer than the JIT'er?Anonymous
February 22, 2005
On exceptions (first comment):
"In reality, its expensive and this advice was poor"
Not true. Exceptions have a cost, granted. But, the thing is, if they are used to actually handle errors (1/1000 rule from Rico's other post), the cost is not important. Why? Well, because if you have an error, the amount of processing to do is normally MUCH less than what's needed for normal operation (we write code to do stuff and not to handle errors, don't we? BTW, that's also why exceptions are good: they help us to have to write less code for error handling; that leaves us more time to write code that does stuff)
The important overhead, then, is the "static" one for exception handling init/cleanup. But, there, do not forget that you compare the compiler-generated code that you don't see, with code for error handling that you write. So, it's not that you have hidden overhead in case of exception-enabled environment as opposed to NOTHING in exception-free (C code, anyone?) environments.Anonymous
February 22, 2005
> You (microsoft as a whole) told them initially this was the new way and that it was practically free.
Where have you seen this? Everything I've read, and I do mean everything, told me that exceptions are expensive and should be avoided-- eg, used for only "Exceptional" cases.
In practice, exceptions are brutally slow-- literally slower than a database query! But I'm fine with that, because I only use them in exceptional conditions as designed.
If you have read this somewhere (microsoft docs saying exceptions are free), can you provide a link to it?Anonymous
February 23, 2005
Why can't CLR use thing like GetWriteWatch instead of memorybarrier?Anonymous
February 23, 2005
The comment has been removedAnonymous
February 23, 2005
Jeffery Sax's thoughts are closely aligned with mine. Of the JIT issues, the inliner is the thing I would most like to see improved and handling of value types in the inliner is doubly important. I think value types are largely under-used and perhaps they might be used more often if you could cash in, in practice, on the gains that they offer in theory.
But of course all the areas identified as weaknesses in the Jit above are obviously on our minds.Anonymous
February 23, 2005
The comment has been removedAnonymous
February 23, 2005
Vote for inlining on value types:
http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=fb7b3c93-a9e9-418b-85b3-b67a195c7e1a
This will make VG.net even faster.Anonymous
February 23, 2005
It's good to know that value-type inlining is on your (Microsoft's) minds. Not having it is driving me insane!
If I had the choice, I would give up every single one of .NET 2.0's new features in exchange for value-type inlining. Seriously.Anonymous
February 24, 2005
There was a question about GetWriteWatch() up there. It's an interesting strategy to use OS based information instead of write barriers but I don't think you'd net any performance doing it.
I see these problems:
-using GetWriteWatch will manipulate the page tables, that causes more TLB misses and loading CR0 ain't cheap anyway, we'd have to do this on every garbage collection
-once armed, you'll pretty much have to take the equivalent of soft page faults to clear the bit (the OS does this for you), those aren't cheap neither
-the granularity would be at the page level, so we'd have to assume a lot more pointers are dirty
-if we use that feature then others (like debuggers) can't, and it seems more useful to them than to us
At the end of the day Write Barriers aren't killing us.Anonymous
February 24, 2005
In order to get an exception to compete in slowness with a database query, you first need a pretty simple query. e.g. "select 1"
Obviously a highly complex query could take arbitrarily long.
Secondly, perhaps the biggest cost of the exception is gathering the callstack information. So you'd want to do the throw from a very deeply nested function.
So I could imagine a case where all that gathering is actually slower than a network round trip to a database but I don't think it would be the norm.
I guess the other factor is there could be an arbitrary number of finally clauses that did an arbitrary amount of work. So if you timed from the throw to the catch you might be waiting a while.
That's the thing about throwing -- you just don't know what is going to happen exactly -- it's both the strength and the weakness of the mechanisms generality.Anonymous
February 24, 2005
Back to using OS info.
Like you said measure first. Do you guys have a prototype?
I can argue that in app that takes does little GC compare to updating it's structures this will be a significant win.
Extra code/data for wb will cause more data to be touched trashing TLB etc.
On large memory systems you would probably wont to have granularity bigger than even a page (sans large pages)
I am sure you would agree that this is nowhere near a clear cut desision.
And I didn't understand your last point about debuggersAnonymous
February 24, 2005
On Write Barriers:
If you're curious about the impact of
Write Barriers refer to this paper:
http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/wb-ismm-2004.pdf
(The Title "Write Barriers - Friend or Foe?" should tell you what it's about).Anonymous
February 25, 2005
Not all exceptions are thrown by code deep down in the call stack. Exceptions are often raised by the CLR because of an issue with code inside a method. For example: OverflowException on a checked block.
A form of 'light-weight exception handling' for these situations, where the JIT bypasses the full exception handling mechanism, would be very welcome. I.e. if a CLR exception is thrown by code inside a try block, and the exception is caught in a corresponding catch block, the overhead of building a full-featured exception object could be eliminated.
This is especially important since the CLI spec states that certain tests throw an exception if the test fails rather than branch if it succeeds. Example: the ckfinite instruction. In other words, without this type of optimization, the CLI imposes severe performance degradation.
I'm aware that exception handling code can be extremely complicated, but that does not mean that the simple cases cannot be optimized.Anonymous
February 27, 2005
Blog link of the week 08Anonymous
September 07, 2007
A few weeks ago I spoke at Gamefest 2007 where I delivered this talk: The Costs of Managed Code: TheAnonymous
September 07, 2007
PingBack from http://msdnrss.thecoderblogs.com/2007/09/08/gamefest-2007-the-costs-of-managed-code-the-avoidable-and-the-unavoidable/Anonymous
December 31, 2007
PingBack from http://surana.wordpress.com/2007/12/31/some-notes-on-net-performance/