Benchmarking, C++, and C# Micro-optimizations
Two posts (1 2) on C# loop optimization got me thinking recently.
Thinking about what I did when I first joined Microsoft.
Way back in the spring of 1995 or so (yes, we did have computers back then, but the Internet of the time really *was* just a series of tubes), I was on the C++ compiler test team, and had just picked up the responsibility for running benchmark tests on various C++ compilers. I would run compilation speed and execution speed tests in controlled environments, so that we could always know where we were.
We used a series of “standard” benchmarks – such as Spec – and a few of our own.
Because execution speed was one of the few ways (other than boxes with lots of checkmarks) that you could differentiate your compiler from the other guy’s, all the compiler companies invested resources at being faster at the benchmarks.
The starting point was to look at the benchmark source, the resultant IL, and the final machine code, and see if you could see any opportunity for improvement. Were you missing any optimization opportunities?
Sometimes, that wasn’t enough, so some compiler writers (*not* the ones I worked with) sometimes got creative.
You could, for example, identify the presence of a specific expression tree that just “happened to show up” in the hot part of of a benchmark, and bypass your usual code generation with a bit of hand-tuned assembly that did things a lot faster.
Or, with a little more work, you could identify the entire benchmark, and substitute another bit of hand-tuned assembly.
Or, perhaps that hand-tuned assembly doesn’t really do *all* the work it needed to, but took a few shortcuts but still managed to return the correct answer.
For some interesting accounts, please text “compiler benchmark cheating” to your preferred search engine.
As part of that work, I got involved a bit in the writing and evaluation of benchmarks, and I thought I’d share a few rules around writing and interpreting micro-benchmarks. I’ll speak a bit about the two posts – which are about looping optimizations in C# – along the way. Just be sure to listen closely, as I will be speaking softly (though not in the Rooseveltian sense…)
Rule 0: Don’t
There has always been a widespread assumption that the speed of individual language constructs matter. It doesn’t.
Okay, it does, but only in limited cases, and frankly people devote more time to it than it deserves.
The more productive thing is to follow the agile guideline and write the simplest thing that works. And note that “works” is a bit of a weasely word here – if you write scientific computing software, you may have foreknowledge about what operations need to be fast and can safely choose something more complicated, but for most development that is assuredly not true.
Rule 1: Do something useful
Consider the following:
void DoLoop()
{
for (int x = 0; x < XMAX; x++)
{
for (int y = 0; y < YMAX; y++)
{
}
}
}
void TimeLoop()
{
// start timer
for (int count = 0; count < 1000; count++)
{
DoLoop();
}
// stop timer
}
if XMAX is 1000, YMAX is 1000, and the total execution time is 0.01 seconds, what is the time spent per iteration?
Answer: Unknown.
The average C++ optimizer is smarter that this. That nested loop has no effect on the result of the program, so the compiler is free to optimize it out (the .NET JIT may not have time to do this).
So, you modify the loop to be something like:
void DoLoop()
{
int sum;
for (int x = 0; x < XMAX; x++)
{
for (int y = 0; y < YMAX; y++)
{
sum += y;
}
}
}
The loop now has some work done inside of it, so the loop can’t be eliminated.
Rule 2: No, really. Do something useful
However, the numbers won’t change. The call to DoLoop() has no side effects, so the entire call can be safely eliminated.
To make sure your loop is really a loop, there needs to be a side effect. The best bet is to have a value returned from the method and write it out to the console. This has the added benefit of giving you a way of checking whether things are working correct.
Rule 3: Benchmark != Real world
There are lurking effects that invalidate your results. Your benchmark is likely tiny and places very different memory demands on the system than your real program does.
Rule 4: Profile, don’t benchmark
C# loop optimization
If you are writing code that needs the utmost in speed, there is an improvement to be had using for rather than foreach. There is also improvement to be had using arrays rather than lists, and unsafe code and pointers rather than array indexing.
Whether this is worthwhile in a specific case depends exactly on what the code is doing. I don’t see a lot of point in spending time measuring loops when you could spend time measuring the actual code.
Comments
Anonymous
December 01, 2008
The comment has been removedAnonymous
December 01, 2008
The comment has been removedAnonymous
December 02, 2008
It still amazes me that after all the emphasis on .Net that the JIT still cannot optimize as well as C++. I know, I know...the priority is on load time over ultr-optimized code. This frankly makes almost no sense to me. Here's why...software servies are the current and future direction of software. Which means that in the vast majority of cases the assembly has been loaded days ago and is repeatably, un-optimally churning through requests. I'd be perfectly happy with an assembly that took an order of magnitude longer to load. Many services only care that once they are loaded that they run as fast and efficiently as possible.Anonymous
December 03, 2008
I'm sure Eric will address this, but I happen to like the state of the JIT engine in .NET. I like that code is sometimes optimized within a running system, because those types of optimizations can't be done with a statically compiled language. I personally get his point, but perception to the user is everything. It started up fast, so it must be fast...Anonymous
December 07, 2008
Jay. Indeed a statically compiled language can't hope to achieve a level of optimization the the JIT is theoretically capable of achieving. That's the promise anyway. I dream of the day that the JIT engine will look at my hardware and produce the fastest possible code. Indeed I hope it goes furthur and performs run-time analysis and the reoders frequently used memory into that same page. That's just the start. There is an nearly endless list of optimizations that could theoretically make C# faster then C++, especially for software services. However, as I mentioned, I'm sure the resources (testing especially) needed to realize even a fraction of it's potential is currently prohibitive.Anonymous
December 07, 2008
Oh, and I'd like to comment on your last comment, "perception to the user is everything. It started up fast, so it must be fast". This is correct for client-side code. I would wager that the majority of .Net code written today is server-side. Optimizing the JIT for the minory seems counter to what Microsoft usually practices.