foreach and performance rules
I was looking at Brad’s blog this morning and I was astounded to find that some people had chosen not to use foreach “Because Rico said so” though they probably didn’t use those exact words. I found that very troublesome for several reasons so I thought I’d offer some comments here.
Let’s start with my two rules of performance, they are:
Rule #1: Measure
Rule #2: Do Your Homework
And I’m pretty adamant that after that there are no more rules. It’s just impossible to predict what is or is not a good idea from a performance perspective without actually measuring because performance work is plagued with many secondary effects and importantly there are other tradeoffs besides just raw space/speed in any case.
Lately I’ve been trying to explain this by taking the approach that engineering decisions must be made quantitatively for them to be truly engineering decisions. I think sometimes people are tempted to use an expert’s intuition (e.g. mine) in place of actual measurements, but that’s cheating… I don’t even stand behind my own intuition so certainly nobody else should :)
For those reasons alone “Rico said so” is just a lousy reason to do anything.
OK, so now we know what isn’t a good way to make a decision like “Do I use foreach?” but that isn’t too helpful. How should such a decision be made?
Well actually I think Performance Quiz #3 (Warnings and Good Practice section) speaks pretty clearly on this point. Start by understanding the characteristics your solution needs to have, then make a plan that is substantially likely to have those characteristics, and verify as you go along. In this case the “RAD” plan is to use foreach and the “classic” plan is to use a more complex “for” construct of some kind. If iterating over some key data structure is an important part of your process then you’ll want to measure the kind of throughput you can expect from each of the solutions (maybe a quick prototype to measure that). Use the measurements to guide your plans so that you can add complexity to your solution only when it is giving you excellent value.
If you did this you would find that there is no penalty at all for using foreach on arrays for instance and you might find the penalty for using foreach on ArrayList to be so small in your cases that the decreased chance of bugs on that path is the way to go. On the other hand you might find that you are creating far too many enumerators because of your usage pattern and something more complicated, but cheaper, is called for. In any case you'll be making an objective decision based on real data.
The thing you must remember is that following a variety of anecdotal performance rules like “don’t use foreach” (which I don’t even believe in, much less advise) is not a substitute for good performance engineering. You’re much more likely to make a bunch of premature optimizations on that path – when what you needed was performance planning.
Comments
- Anonymous
July 12, 2004
You forgot
Rule 0: Define "good performance".
Gathering data is pointless if your program is already performant enough! And even if it isn't performant enough, knowing what you're optimizing for is a necessary precursor to doing the measurement.
I know that sounds so obvious that it doesn't even need to be stated, but I've gotten PLENTY of email from people asking me how to make stuff faster who cannot tell me how they'll know when it's fast enough! - Anonymous
July 12, 2004
When I talk about Rule #1 I like to point out to people that once you've decided that you want to measure then you're immediately faced with this great question of what to measure.
That is a super healthy question because it forces you down the path of saying "What's important to my customers? What scenarios? What metrics?" Immediately you need to think about goals and you might quickly realize your goal is very easily met and only the most trivial measurement (such as counting one mississippi, two mississippi, etc.) is needed to verify that you're in great shape.
Whatever you discover, you'll be in a much better place if you understand what it is that your customer needs from you.
I try to emphasize that the discipline/approach is the same whether or not you're worried about "mississippis" or "milliseconds" -- it's just a question of the amount of effort that should go into it.
As you correctly point out, sometimes very little effort is exactly the right answer.