Another tidbit regarding optimizations
Since today's theme seems to be optimization (see Sam's blow-by-blow), here is a bit about index-based loops vs. for-each loops. Everyone knows that index-based loops are faster, but why and by how much?
A friend (Hi Tamar!) asked me why the following 2 snippets behaved differently while debugging:
#1:
foreach(string s in myStringArray)
{
if (s=="b") break;
}
#2:
foreach(string s in myArrayList)
{
if (s=="b") break;
}
When debugging line-by-line, Snippet #1 executes the break and then jumps to the code after the loop, while snippet #2 executes the break, then returns to the foreach line one more time before leaving the loop.
It seemed obvious that the compiler was doing some sort of optimization on the "native" array vs the arraylist object, but to know exactly what was going on I had to peek into the generated IL. Here's how it looks under the covers:
Snippet #1:
IL_0003: stloc.2
IL_0004: br.s IL_001d
IL_0006: ldloc.1
IL_0007: ldloc.2
IL_0008: ldelem.ref
IL_0009: stloc.0
IL_000a: ldloc.0
IL_000b: ldstr "b"
IL_0010: call bool [mscorlib]System.String::op_Equality(string,
string)
IL_0015: brfalse.s IL_0019
IL_0017: br.s IL_0023
IL_0019: ldloc.2
IL_001a: ldc.i4.1
IL_001b: add
IL_001c: stloc.2
IL_001d: ldloc.2
IL_001e: ldloc.1
IL_001f: ldlen
IL_0020: conv.i4
IL_0021: blt.s IL_0006
IL_0023: ret
Notice lines IL_0010 - IL_0017: compare the string, if false skip to line 19 (which continues the loop), else branch to line 23, which is outside the loop.
Now check out Snippet #2:
.try
{
IL_0007: br.s IL_0024
IL_0009: ldloc.1
IL_000a: callvirt instance object [mscorlib]System.Collections.IEnumerator::get_Current()
IL_000f: castclass [mscorlib]System.String
IL_0014: stloc.0
IL_0015: ldloc.0
IL_0016: ldstr "b"
IL_001b: call bool [mscorlib]System.String::op_Equality(string,
string)
IL_0020: brfalse.s IL_0024
IL_0022: br.s IL_002c
IL_0024: ldloc.1
IL_0025: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
IL_002a: brtrue.s IL_0009
IL_002c: leave.s IL_003f
} // end .try
finally
{
IL_002e: ldloc.1
IL_002f: isinst [mscorlib]System.IDisposable
IL_0034: stloc.2
IL_0035: ldloc.2
IL_0036: brfalse.s IL_003e
IL_0038: ldloc.2
IL_0039: callvirt instance void [mscorlib]System.IDisposable::Dispose()
IL_003e: endfinally
} // end handler
IL_003f: ...
Now the code is a bit more complex, as the loop is instantiating an IEnumerator and also has a try/finally clause. The inner loop (lines IL_001B-IL_0022) behaves similarly to Snippet #1 (except for using the IEnumerator ), but now when encountering the break it still has to go through the "finally" block which includes one last call to dispose. This is what causes the debugger to go back to the "foreach" line before exiting the loop.
So, from the above code you can see that although both snippets use the for-each syntax, for native arrays the C# compiler optimizes the loop to be just as efficient as an index based loop. However with anything more complex the compiler follows the IEnumerable interface to the letter which results in the overhead of the instantiation, try/finally and Dispose call.
And just to prove the point, here is the IL generated when looping through an ArrayList by index:
IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: br.s IL_0022
IL_0004: ldarg.0
IL_0005: ldloc.0
IL_0006: callvirt instance object [mscorlib]System.Collections.ArrayList::get_Item(int32)
IL_000b: callvirt instance string [mscorlib]System.Object::ToString()
IL_0010: ldstr "b"
IL_0015: call bool [mscorlib]System.String::op_Equality(string,
string)
IL_001a: brfalse.s IL_001e
IL_001c: br.s IL_002b
IL_001e: ldloc.0
IL_001f: ldc.i4.1
IL_0020: add
IL_0021: stloc.0
IL_0022: ldloc.0
IL_0023: ldarg.0
IL_0024: callvirt instance int32 [mscorlib]System.Collections.ArrayList::get_Count()
IL_0029: blt.s IL_0004
IL_002b: ...
This should be compared to Snippet #2's IL. You can see how much simpler the index-based loop is. So, now we know why using "for-each" is slower then using indexed arrays. But (and this was news to me) when using native arrays the difference is neglible.