ForEach enumeration with index
Yesterday I did a code review that used a pattern I've seen a couple of times in the past. The pattern is using a foreach statement to enumerate over some collection but doing things where an index variable is important. This is typically done when working with an enumerable and where the cost to convert to a list or array is considered too high since it leads to enumerating the same collection twice. It typically looks something like this:
1: int i = 0;
2: foreach (var item in enumerable)
3: {
4: someArray[i] = item.SomeValue;
5: someOtherArray[i] = item.OtherValue;
6: i++;
7: }
The obvious fix would be to create a nice little extension method to deal with this:
8: public static class EnumerableExtensions
9: {
10: public static void ForEachWithIndex<T>(
11: this IEnumerable<T> enumerable,
12: Action<T, int> loopBody)
13: {
14: int i = 0;
15: foreach (var item in enumerable)
16: {
17: loopBody(item, i++);
18: }
19: }
20: }
Then it struck me that I should be able to do this with LINQ and it turned out to be an interesting exercise. Note that we have to return some dummy value since we're using a select method to transform the data. Also note the call to the Count method in order to force enumeration of the collection causing the selector delegate to execute for each element. This solution however makes me uncomfortable since I don't feel 100% certain that this whole statement will not be optimized into nothing since the return value of the Count method is not used. Only if that value was actually used I would feel good about this solution:
21: enumerable.Select(
22: (item, i) =>
23: {
24: someArray[i] = item.SomeValue;
25: someOtherArray[i] = item.OtherValue;
26: return true;
27: }).Count();
However I realized that in most cases where I've seen this kind of pattern, the actual order of execution is rarely important. Hence parallel execution could be the solution and then I actually could do this as a LINQ expression. Here is an implementation of that as an extension method:
28: public static void ParallellForEachWithIndex<T>(
29: this IEnumerable<T> enumerable,
30: Action<T, int> loopBody)
31: {
32: enumerable
33: .Select((item, i) => new { Item = item, Index = i })
34: .AsParallel()
35: .ForAll(data => loopBody(data.Item, data.Index));
36: }
Of all these options I guess the last one is pretty neat for parallel execution, but I would stick with the non-LINQ extension method for sequential work mainly because it is easier to understand than LINQ for new developers.
Comments
- Anonymous
December 27, 2011
Great article. I think you perfectly identified a great opportunity to use Parallel processing and the considerations behind it.