次の方法で共有


A curious subtlety about how CLR does interface dispatch on array types

This post is about some internal CLR magic that usually makes accessing .NET arrays through an interface up to 100 times faster (if you're an optimist; or accessing arrays up to 100 times slower in the worst case, if you're a pessimist). I'm usually very curious about how this universe works under the hood - and finding out about the internals of the CLR for me is like discovering laws of physics :) So if you ever find yourself measuring the timing of array access through generic interfaces in performance critical code, this post is probably worth taking a look at.

Anyway, here's the story. A customer (Robert Kieboom from CityGIS) has reported an intermittent performance decrease when accessing arrays through the IList<T> interface. Specifically, for some weird reason, same codepath started executing 100 times slower after a call to some seemingly unrelated code. Huge thanks to Robert for providing this precise and exact repro (I hope he doesn't mind if I share it):

 class Program
{
    public static int AddValues(IList<int> list)
    {
        int add = 0;
        /*foreach ( int value in list )
          add += value;*/
        for (int i = 0; i < list.Count; i++)
            add += list[i];
        return add;
    }

    static void TestAddArray(int[] items)
    {
        Stopwatch timer = Stopwatch.StartNew();
        AddValues(items); 
        timer.Stop();
        Console.WriteLine("Array: {0} ms.", timer.ElapsedMilliseconds);
    }

    static void TestAddIList(int[] items)
    {
        IList<int> cl = new List<int>(items);

        Stopwatch timer = Stopwatch.StartNew();
        AddValues(cl); 
        timer.Stop();

        Console.WriteLine("IList: {0} ms.", timer.ElapsedMilliseconds);
    }

    static void Main(string[] args)
    {
        int[] items = new int[1000000];
        for (int i = 0; i < items.Length; i++)
            items[i] = i - (items.Length / 2);

        TestAddArray(items); // fast
        TestAddArray(items); // fast

        TestAddIList(items); // something weird happens??!

        TestAddArray(items); // slow
        TestAddArray(items); // slow
    }
}

Here's what this program does. First we call TestAddArray a couple of times that uses an array disguised as an IList<int> - both calls are very fast, each about 108ms on my machine.

Then we call TestAddIList that uses a List<int> disguised as an IList<int> - this call is even faster, 21 ms.

Now - watch this - both subsequent calls to TestAddArray take 3638ms and 3677ms respectively - about 33 times slower!

If you remove the call to TestAddIList(items), all four calls will be equally fast. What's going on? The same codepath becomes slower after some random call? We were puzzled. After Eric couldn't find any explanation for this from the C# compiler standpoint (we were producing correct IL), it was clear that the issue is deeper than IL and has to do with the execution engine itself. So I routed the bug to the CLR team - and pretty soon we had a reply from them.

It turns out, CLR has some optimizations on how they do virtual dispatch on a generic interface, if the runtime type is an array. As I'm afraid to talk nonsense outside of my area of expertise, I'll just quote the bug description from the CLR folks:

The problem is rooted in the fact that IList<T>::get_Item is an interface invocation on an array type (in the example an int[]).  Because arrays are very common, and interface dispatch through interfaces like IList<T> is uncommon, the runtime saves significant space by having special logic for dispatching through generic interfaces like these. 

Our interface dispatch logic has an important optimization where FOR EACH CALL SITE, it checks one particular target explicitly, and if that fails, does slower hash table lookup, and if that lookup fails, fall back to an  expensive lookup.  Thus for calls sites that tend to go to one destination it is very fast, and call sites that go to many targets, you get good, but not as great performance.

As it turns out, there is a problem in the special logic for dispatch logic for arrays (but not for ordinary interfaces) that cause the hash table lookup to fail. What this means is either interface dispatch FOR ARRAY TYPES is very fast (if the explict check works), or very slow (when it does not).  

In the fast case, the interface calls to 'get_Item and get_count' (used in the [] operator) call with at 'this' pointer of int[] first, which means this is the value that is used for the 'fast check'.   The second time we call it with List<int>, not int[]. However since there is no bug associated with normal (not array) dispatch, everything works well.

However when the benchmarks are reversed, List<int> is the first 'this' that dispatches through 'get_Item' and 'get_Count' call sites so they get the benefit of the fast check. When the first benchmark is run, it now dispatches using an int[] as the 'this' pointer, and because of the bug, the hash table lookup always fails and so only the very slow lookup mechanism is used.

Workarounds:

1) In performance critical code, ensure that if arrays are passes as IList<T> (or super Interfaces), that T[] is always called first.  There is a good chance that this approach is impractical, but if there are only a few such critical paths then it is possibly useful. 

2) Always use List<T> where you would have used T[]. By the way, if you can avoid using Collection<T> that is probably a good thing, as it is a wrapper that in most cases does not add much value.

3) In any performance critical paths, special case the array case with code like

             method(IList<T> aIList)

             {

T[] asArray = aList as T[]

                    if (asArray != null)

                    {

                              // logic on arrays

                    }

                    else

                    {

                              // logic on aList

                    }

Option 3 has the advantage that it is SIGNIFICANTLY faster than interface dispatch will ever be because the array check is done only once (and then lots of operations are done), rather than effectively being done on every interface invocation. I would not recommend any work-around that does not have some positive effect on your code base.  I could believe that options 2 and 3 MAY be a superior solution in the long term anyway, so I would suggest investigating them.

This explains why calling arrays through generic interfaces might under certain conditions become considerably slower (or, alternatively, why these calls are in many cases considerably faster than the general case). In any case, virtual dispatch is slower than calling array members directly, so if you have a chance, just work with arrays directly or use generic collections like List<T>.

Regarding the future fate of this particular CLR optimization failing in some cases, we are considering whether we shall fix it for the next release of the .NET framework. However, as always: no promises here folks! If you do hit this problem (which is pretty unlikely), just use the workarounds for now - thankfully they are easy and reliable.

I hope this was insightful and many folks have found this interesting - I'd like to give credit to everyone who reported, reproduced and investigated this issue (including, among others, Robert Kieboom, Arie Leeuwesteijn, Erik Meijer, Eric Lippert, Vance Morrison and Jan Kotas) - and I hope they don't mind me sharing this with the rest of the .NET world.

Comments

  • Anonymous
    July 14, 2008
    I am unable to reproduce. I am using code directly from this post and I am getting: Array: 103ms Array: 158 ms. IList: 24 ms; Array: 145 ms; Array: 109 ms;

  • Anonymous
    July 14, 2008
    Thanks for the blog Kirill! Of course I don't mind sharing the code, this is why we tried to make it as simple as possible. Regarding the first reaction: the problem doesn't occur on Win64, so maybe you are using that? One little site note: the URL of CityGIS is not correct. This is an unrelated company. The correct URL is www.citygis.nl. Thanks again for the good explanation!

  • Anonymous
    July 14, 2008
    Thanks Robert! I updated the link. Alex, another reason why you're unable to reproduce is that this doesn't happen with optimized code (Release build). You can only repro this on a Debug build. Compiling program.cs with /o+ makes the problem go away.

  • Anonymous
    July 14, 2008
    Je ne sais même pas comment résumer ce post sans recopier tout le code tellement c'est surprenant. Le

  • Anonymous
    July 15, 2008
    Thank you for your help guys. I know that problem investigated and confirmed and workaround has been found, but I still cannot reproduce it on 32-bit XP SP3 neither in Debug nor in Release mode. Thank you anyway.

  • Anonymous
    July 15, 2008
    I can't reproduce (release/debug under VS/debug out of VS) such an enormous difference but after i get a 8% slower. (VS2008 SP1 beta1 on Vista 32 bits SP1 on E6850)

  • Anonymous
    July 15, 2008
    It's pretty often that customers complain that something is broken and there is a bug in our product. That's normal, I'm a QA and I expect that. But for the first time in my career I hear customers complain that everything is working fine and that there is no bug :)) Frankly, I don't know why it's not reproducing on your machines folks, but I'm glad that .NET is so good and stable that it's sometimes difficult to even reproduce a known bug :)

  • Anonymous
    July 17, 2008
    Piece of advice: Leave that managed mess out. It is dead slow and there you go, you get what you deserve for working with a mess that is a VM and especially the ancient MS JIT. This is just silly, this is what compilers should do and you guys obviously are too managed to address this or realase a new and proper JIT. Java one beats you by miles, so what on earth is going on? Plus all the things you'll do to the language just to make it popular and address Ruby. That's another joke. Why not fix your problems, rather than keep working around them and keep building a business so cutomsers keep working around nonsense such as this, performance, all rendering and UI technologies you make, the lot really... Why? In my mind, it is a result of 'managed' thinking. It is inferior, not "SUPERIOR".

  • Anonymous
    July 17, 2008
    No .NET is not so good or stable. You guys have major issues with preJIT, JITer quality, and performance across libraries. If you Google for it, you'll get billions of results.

  • Anonymous
    July 18, 2008
    I'm sorry that .NET has obviously disappointed you. I totally agree how frustrating can it be to deal with software that does not do what you expect in a way that you expect. I don't think I can add much more to that. I can only say that people here at Microsoft are working very hard to make the platform, languages and tools better. I personally work weekends and really dedicate my best to find and fix the issues that we have. It's sad that we're still not there to satisfy each and every one of our customers. Good news - there are a lot of alternatives out there - you're free to use C++, Java, Ruby or which ever other platform/language you prefer. If you don't like it - don't use it. If you're forced to use it - please understand that we're humans who do our best to design an incredibly complex piece of software - bear with us, help us and give us feedback like this - we'll do our best to fix it. For example, the CLR issue I was talking about in this blog will be fixed in .NET 4.0. We fix them one by one, it just takes time and patience. On a personal sidenote, I love .NET and think that this is the best platform for me - and I'm going to stick with it for years to come and do my best to make it even better.

  • Anonymous
    July 22, 2008
    A colleague read this and immediately pointed out that using "public static int AddValues( int[] list )" would be much faster. We tried it and indeed it runs 6 times faster than the fastest IList case. See results - ArrayRaw: 3 ms. Array: 74 ms. Array: 78 ms. IList: 19 ms. Array: 5239 ms. Array: 5266 ms. If speed is what you need, generics are not your best bet.

  • Anonymous
    July 23, 2008
    Yes, if you're able to use arrays in performance-critical code paths, definitely do so. That's what the Workaround 3 suggests. However in Robert's situation they couldn't use arrays directly because they lose incapsulation (when using IList<T> you can pass a different data structure later, e.g. List<T>). It's a trade-off between generic extensibility and performance.

  • Anonymous
    August 11, 2008
    Preparing to forthcoming .NET 3.5 SP1 I have uninstalled .NET 2.0 and 3.0 Service Pack 2, i.e. Now I only have .NET 2.0 and .NET 3.0 SP 1 and 3.5 with no SP. Now I can see bug all right: Array: 95 ms. Array: 86 ms. ILIst: 21 ms. Array: 5989 ms. Array: 5827 ms. Will see if this thing disappears again after I nstall 3.5 SP 1

  • Anonymous
    August 11, 2008
    Yes, after installing .NET 3.5 SP1 the problem disappeared.

  • Anonymous
    August 11, 2008
    Good to know, thanks Alex.