Performance Quiz #9 : IList<T> List and array speed

A short and sweet quiz with lots of juicy discussion possibilities:

           public int Sum(IList<ushort> indices)
{
int result = 0;
for (int i = 0; i < indices.Count; i++)
result += indices[i];
return result;
}

Considering only the time it takes to do the Sum (i.e. assuming we had already set up the array/list) which gives better performance and why?

           // #1
ushort[] tmp = new ushort[500000]; // this doesn't count
Sum(tmp); // this is what we are timing

OR

           // #2
List<ushort> tmp = new List<ushort>(500000); // this doesn't count
for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn't count
Sum(tmp); // this is what we are timing

What say you gentle readers?

(my solution is now posted here)

Comments

  • Anonymous
    March 09, 2006
    Mmm, rough.

    I want to say #2, because with #1 you've got an array of type ushort, while in the Sum method you're working with an IList<ushort> - so the array has to be cast to an IList<ushort>.

    Just my intuition talking - I guess I'd need to time it myself to be sure.
  • Anonymous
    March 09, 2006
    The comment has been removed
  • Anonymous
    March 09, 2006
    The comment has been removed
  • Anonymous
    March 09, 2006
    The comment has been removed
  • Anonymous
    March 09, 2006
    Seems like they would perform the same. I expect that once built, List<ushort> is internally nothing but a ushort[].
  • Anonymous
    March 09, 2006
    I believe #1 will be faster.

    With #1 .NET will optimise getting array values by removing any bounds checking. Because of the for loop it knows it will never access values beyond the length of the array.

    The Item property on the List<ushort> in #2 will however will still do bounds checking for each get.
  • Anonymous
    March 09, 2006
    Actually, is bounds checking optimization done at runtime or during the compile? If it is during compiling then I don't think it wouldn't know that the IList<ushort> is an array.
  • Anonymous
    March 09, 2006
    I think #1 is faster.

    Because both use the same sum jitted code for sum(), which means JIT doesn't make an optimzie code for array, and List<short> is just a wrappar of short[].
    So #1 is faster for the overhead of List<>.
  • Anonymous
    March 09, 2006
    On the second thought, maybe I was wrong...

    On #2, List<short> uses optimized code for array indexing. But #1 accesses an array via interface, which is slower than the optimized code.

    So, hmm, I don't know!
  • Anonymous
    March 09, 2006
    I would assume that #1 if faster, since it has no management overhead, while the List<> does.
    The cost of interface invocation (Count) is something that the List needs to handle as well, since it is handles an array internally.
  • Anonymous
    March 09, 2006
    Well, firstly the test code runs slower than necessary - this small change speeds it up by 50% or so:

    public int Sum(IList<ushort> indices)
              {
                  int result = 0;
                  int n = indices.Count; // cache the Count rather than accessing it on every loop iteration
                  for (int i = 0; i < n; i++)
                      result += indices[i];
                  return result;
              }

    Now, Im finding that the summing an array as an IList<> is about 4 times slower than summing a List<> as an IList, and that summing up an array directly is 2.5 tims faster than summing up an IList<>. Hard to imagine what is gobbling up that factor 10 in performance.

    Is there some boxing/unboxing going on in there?


  • Anonymous
    March 09, 2006
    I've got the same result as damien's (#1 is 3-4x slower). I thought at least #1 was faster on debug build, but not.

    I think accessing an array via interface is VERY slow than accessing directly, or the optimizer is pretty good to deal with array directly.
  • Anonymous
    March 09, 2006
    I mean, JIT makes a good code for accessing an
    array directly, even in debug build.

    I guess the following is what happen.
    #1: accessing an array via interface(in sum(), slow, no optimization for array)
    vs
    #2: accessing an array directly(in List<>.this[], very fast, optimized for array) + vtable dispatch to invoke List<>.this[](very fast)
  • Anonymous
    March 09, 2006
    In Rico's Example, #2 is faster but this is actually twice as fast as #2 ...

    ushort[] tmp = new ushort[500000]; // this doesn't count
    int result = Sum(tmp);

    public int Sum(ushort[] indices)
           {
               int result = 0;
               int n = indices.Length-1; // cache the Count rather than accessing it on every loop iteration
               for (int i=n;i>-1;i--) // Iterating thru Backwards is faster than Forwards
                   result += indices[i];
               return result;
           }

    I have been experimenting with doing this type of iteration in JavaScript, where every optimization can mean exponential gains in performance. You can check out come of my tests here ... http://geekswithblogs.net/mparsons/archive/2006/03/02/71175.aspx#FeedBack

    Check out the Test Page in my comment.
  • Anonymous
    March 09, 2006
    The 2nd commenter was correct that #2 runs faster, and also that changing the for loop to a foreach loop makes #1 way faster than #2. I'm really not sure why this happens though. Maybe in certain cases the JIT compiler thinks it has to do bounds checking on the array when it really didn't have to.
  • Anonymous
    March 10, 2006
    Not to be a stick-in-the-mud, but I kinda think 500k virtual-calls do count.
  • Anonymous
    March 10, 2006
    Great comments so far!  I love this one, it's so juicy :)
  • Anonymous
    March 10, 2006
    I'm with Wesner on this one. The (internal) SZArrayHelper class is a class that contains the method bodies that is generated for (1d) arrays. The reason why #1 is more time consuming seems evident when you compare SZArrayHelper.get_Item's implementation to that of List<T>.get_Item. The former includes a cast (which incurs per item in your array), which is where I believe the overhead of time is spent.
  • Anonymous
    March 10, 2006
    Rico’s posted another performance quiz on his blog, which means that I get to run the F1 profiler on...
  • Anonymous
    March 10, 2006
    Hey Rico. I ran F1 on this scenario and I got some more data. But to be honest I'm still not 100% sure what these JIT_IsInstanceOfArray and ArrayIsInstanceOfNoGC function are doing in the array case. I have some guesses, but nothing that I'm sure of. Anyways, my run of the profiler is here:
    http://blogs.msdn.com/ianhu/archive/2006/03/10/548778.aspx
  • Anonymous
    March 10, 2006
    The comment has been removed
  • Anonymous
    March 10, 2006
    Mono has similar results:

    http://www.jprl.com/Blog/archive/development/mono/2006/Mar-10.html
  • Anonymous
    March 11, 2006
    KeithH:

    but sum() accesses the array via IList<>.this[int], which doesn't return object, so no boxing around there.
  • Anonymous
    March 11, 2006
    I have tested these two with a very simple profiler (Stopwatch) and got dramatic performance differences.

    Array/Generics = 852,33
    The Array version is over 800 times slower.

    I have used the following code:
          IList<ushort> gentmp = new List<ushort>(ArraySize);
           ushort[] arrtmp = new ushort[ArraySize];

           public void Run()
           {
               long GenTicks;
               long ArrayTicks;

               Stopwatch watch = Stopwatch.StartNew();

               Sum(gentmp); // this is what we are timing
               watch.Stop();
               GenTicks = watch.ElapsedTicks;

               List<ushort> ArrCopy = new List<ushort>(ArraySize);
               ArrCopy.AddRange(arrtmp);
               watch = Stopwatch.StartNew();

               Sum(arrtmp); // this is what we are timing
             
               watch.Stop();
               ArrayTicks = watch.ElapsedTicks;

               Console.WriteLine("Array/Generics = {0:F2}", (double)ArrayTicks /(double) GenTicks);

           }

    Perhaps I have an obvious error here which I did not see yet. But the number I get seem to be consistent. What is very interesting if I copy the array completely into a generic list and use that one I get
    List<ushort> ArrCopy = new List<ushort>(ArraySize);
    ArrCopy.AddRange(arrtmp);
    Sum(ArrCopy);

    Array/Generics = 1,70

    Nearly the same. I think there is massive overhead during the access of each element going on here. It is even cheaper to copy the whole thing and use that one if the array is big enough.

    Yours,
     Alois Kraus
  • Anonymous
    March 11, 2006
    The comment has been removed
  • Anonymous
    March 11, 2006
    One of the problems with this is if the assembly is marked as debug mode or a debugger attached during startup the JIT disables optimizations, and that will make a big difference.

    The optimizations Mike Parsons does are very similar to what the JIT can do when it is allowed to optimize.
  • Anonymous
    March 12, 2006
    To sum it up. Every Array supports IList<T> with its specified type at compile time. If you pass it via the IList<T> interface to another function the CLR has special run time checks embedded to check if is an array or a reall generic list and an additional type check to ensure type safety. This is all done at run time for each get call which explains this factor of several hundred times slowdown. Is this the correct interpretation of Ianhu's profiling run?

    Yours,

    Alois Kraus
  • Anonymous
    March 12, 2006
    I'd think it just depends on what kinds of optimization are used (i.e. available and used, whereas not yet available would result in not currently being used).

    Meanwhile this confuses me:
    for (int i = 0; i < 500000; i++) tmp.Add(0); // this doesn't count

    Even though it doesn't count, even a minimal amount of optimization should make it even less count ^_^
  • Anonymous
    March 13, 2006
    The comment has been removed
  • Anonymous
    March 13, 2006
    The comment has been removed
  • Anonymous
    March 13, 2006
    The comment has been removed
  • Anonymous
    March 13, 2006
    If you happen to know the exact type of the container, it's always useful to let JIT know it too.  It can produce more optimized code.  For comparison, I passed List<> into the following function Sum2, and it appeared to be much faster.  No function calls are made inside Sum2().  JIT inlined it.

    static public int Sum2(List<ushort> indices)
    {
    int result = 0;
    for (int i = 0; i < indices.Count; i++)
    result += indices[i];
    return result;
    }

    Timing:
    Sum(IList<ushort>)
    time: 404
    Sum2(List<ushort>)
    time: 57

    You can call it "abstraction penalty."  Sum() function is generic, it operates on any container implementing IList<>, whereas Sum2() is specific to one concrete type List<>.

    Interesting that the abstraction penalty cannot be avoided with a generic function:

    static public int Sum3<T>(T indices) where T : IList<ushort>

    The code generated here is no better than the one for the original Sum() function.
  • Anonymous
    March 16, 2006
    Last week I posted Performance Quiz #9&amp;nbsp;for discussion. Well the comments have been just awesome...
  • Anonymous
    January 23, 2007
    Last week I posted Performance Quiz #9 for discussion. Well the comments have been just awesome and many