Cost of array bounds checking -- one experiment
Sometimes people ask me what the overhead of array range checking is. A while ago I did a quickish experiment to measure the cost of the bounds checking introduced by the JIT in a typical application. This is just one part of the managed safety net but it's a fairly easy one to get a handle on compared to some of the other overheads. Notably, it's a lot trickier to get a handle on the marginal cost of explicit range-checking done in helper methods in base classes.
The test case was a Windows Presentation Framework application that basically tried to render the same thing that you might see on the MSN home page. The time being measured is from process startup (t=0) until Main exits (which is almost but not quite everything). This is a wall clock time on a normally configured machine so there is noise in the system which I attempt to remove as shown below.
To faciliate this experiment I used a then recent build of WPF, which was built using PD3 of the CLR (so this is 2004 vintage). I modified mscorjit.dll so that it does not introduce array bounds checks for arrays or strings. This modified jitter was built using a slightly different process than the official build (I did a standard dev build) so to control that difference I also rebuilt the standard jitter, unmodified, in the same way. So in this experiment I'm comparing two jitters both build by me in the same way. To reduce the impact of normal machine noise in the runs I removed the 10 slowest run --but I show the full data as well.
Times were gathered with everything was hot in the disk cache.
Time Results:
Test |
N |
Average |
Std. Dev. |
Standard All | 36 | 1.661s | 0.275s |
Standard Slowest 10 Removed | 26 | 1.506s | 0.054s |
Norange All | 36 | 1.609s | 0.574s |
Norange Slowest 10 Removed | 26 | 1.497s | 0.017s |
Delta All | 36 | 0.052s | |
Delta Slowest 10 Removed | 26 | 0.009s | |
% Delta All | 36 | -3.1% | |
% Delta Slowest 10 Removed | 26 | -0.6% |
But notice that in both cases the delta is within one standard deviation of the mean. Based on this experiment we could not reject the null hypothesis that range checking is making no difference in execution time (i.e. noise is making a bigger difference).
Space Results
Working set numbers were very unstable due to the interactive nature of the test case, the variation from run to run is even more pronounced in terms of working set than it is in terms of time and I strongly feel that this variation totally hides any meaningful analysis of workingset. However it's easy enough to see the absolute difference in terms of size of the assemblies on disk which I think accurately accounts for the space tax.
Assembly | Checked | Unchecked | Delta |
mscorlib.ni.dll | 9,777,152 | 9,703,424 | -0.75% |
System.Drawing.ni.dll | 1,585,152 | 1,581,056 | -0.26% |
System.Security.ni.dll | 901,120 | 897,024 | -0.45% |
System.ni.dll | 6,713,344 | 6,668,288 | -0.67% |
System.Windows.Forms.ni.dll | 12,132,352 | 12,099,584 | -0.27% |
Others (similar, not shown) | ... | ... | ... |
Total | 73,719,808 | 73,359,360 | -0.49% |
So basically, at least in this experiment, the cost was within the noise level of the test. Of course your mileage may vary but this seems to agree with anecdotal experience in larger applications.
Comments
- Anonymous
July 12, 2006
The comment has been removed - Anonymous
July 12, 2006
The comment has been removed - Anonymous
July 12, 2006
The comment has been removed - Anonymous
July 22, 2006
The JIT generally does a pretty good job at removing array bounds checks in simple cases anyways (if its passed in or a local).
I showed the particular optimization here http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx
So I am not sure how "good" these tests actually are since the JIT removes many anyways.
My concern is that the JIT does not remove all that it should. I discuss it in detail here http://codebetter.com/blogs/gregyoung/archive/2006/07/08/147230.aspx.
A bigger concern is that it is impossible to write an efficient bit of safe code that does something similar to the following (note this is a quick example typed up for clarity, not necesarily compiling or well designed code)
public static int [] CopyWithSquare(int [] in) {
int [] ret = new int[in.Length];
int tmp;
for(int i=0;i<in.Length;i++) {
ret[i] = in[i] * in[i];
}
}
One of the two arrays will always have a bounds check because the optimizer will only remove the bounds check for the array that is used in the condition.
It would seem to me that in this case the JIT should probably atleast hoist the other array bounds check (especially when it is held in a register as you know the operation is safe)
Of course I can get this to not happen using unsafe code but ...
1) Most languages don't support unsafe code
2) Many situations are beginning to require verifiable code which would keep people from calling my routine.
I agree that in insert application your overall gain from the optimization will not be huge (there are many other items which are more likely at fault). When you are however dealing with optimizing code that you have isolated as a problem point through profiling it seems to be common that you find arrays with bounds checks :) - Anonymous
July 22, 2006
to happen using unsafe code. - Anonymous
August 01, 2006
try skipping the array bound checking using Exceptions. In java, (performance tuning manual) they suggested make the loop infinite and use a "try...catch" exception.
If this works for csharp...let me know.
Also, I am using the generic exception eg."}catch(Exception e){} ". Will narrowing down to the exact exception prove faster exception handling. eg "(System.IndexOutOfRangeException e){}"
Code:
public static int [] CopyWithSquare(int [] in) {
int [] ret = new int[in.Length];
int tmp;
try {
for(int i=0;;i++) {
ret[i] = in[i] * in[i];
}
}catch(Exception e){}
}
Please let me know what you make of this.
jeremygwa At hotmail.com - Anonymous
August 03, 2006
Didn't expect it to work and it doesn't. This is actually slower than the original loop because the original loop removed one of the bounds checks where this has two.
public static int[] CopyWithSquare(int[] val) {
int[] ret = new int[val.Length];
00000000 push ebp
00000001 mov ebp,esp
00000003 push edi
00000004 push esi
00000005 push ebx
00000006 sub esp,18h
00000009 xor eax,eax
0000000b mov dword ptr [ebp-18h],eax
0000000e mov edi,ecx
00000010 mov edx,dword ptr [edi+4]
00000013 mov ecx,7915982Ah
00000018 call FFCB0FB0
0000001d mov dword ptr [ebp-24h],eax
int tmp;
try {
for (int i = 0; ; i++) {
00000020 xor esi,esi
00000022 mov ebx,dword ptr [edi+4]
00000025 mov eax,dword ptr [ebp-24h]
00000028 mov ecx,dword ptr [eax+4]
ret[i] = val[i] * val[i];
0000002b cmp esi,ebx
0000002d jae 0000004A
0000002f mov edx,dword ptr [edi+esi4+8]
00000033 mov eax,edx
00000035 imul eax,edx
00000038 mov edx,eax
0000003a mov eax,dword ptr [ebp-24h]
0000003d cmp esi,ecx
0000003f jae 0000004A
00000041 mov dword ptr [eax+esi4+8],edx
for (int i = 0; ; i++) {
00000045 add esi,1
00000048 jmp 0000002B
0000004a call 79443501
}
}
catch (Exception e) { }
0000004f call 79345119
return ret;
00000054 mov eax,dword ptr [ebp-24h]
00000057 lea esp,[ebp-0Ch]
0000005a pop ebx
0000005b pop esi
0000005c pop edi
0000005d pop ebp
0000005e ret - Anonymous
August 09, 2006
Rico,
Can you comment a little on array performance on .net from a number crunching point of view? I would like to be able to write everything in C#, but worry that it'll be too much of a performance hit (for various reasons I'd like the array code to be managed, not external native code I'm using from a C# app).
I understand the differences in support for 1D vs Jagged vs MultiD arrays. My concerns include lack of of use of SIMD instructions by the JIT, plus lack of blocking and loop transformations.
Any experience on perf of fortran style .net code? - Anonymous
August 29, 2006
Questa è una domanda che mi sono posto anch'io parecchie volte: il controllo&nbsp;dei limiti inferiore... - Anonymous
October 03, 2006
Questa è una domanda che mi sono posto anch'io parecchie volte: il controllo dei limiti inferiore...