Поделиться через


LINQ Farm: More on the LINQ Aggregate Operators

The LINQ aggregate operators allow you to perform simple math operations over the elements in a sequence. This post is designed to walk you through those operators, and give you an overview of how to use them. Table 1 shows a list of the 7 aggregate operators.

Note: All the samples shown in this post are found in the AggregateOperators program found on Code Gallery LINQ Farm.

Table 1: LINQ includes 7 aggregate operators designed to help you perform simple math operations. The definitions shown in this table are over-simplifications that give you a general sense of what you can do with a particular operator.

Count Count the elements in a sequence.
LongCount Count the elements in a very, very long sequence.
Sum Add the elements in a sequence
Min Find the smallest element in a sequence
Max Find the largest element in a sequence
Average Find the average value in a sequence
Aggregate Perform various binary operations on the elements in a sequence.

Except for the Aggregate operator itself, all of these operators have a simple, obvious default use. Several of these operators, do, however, have overloads that need a few sentences of explanation. I will show you one simple example of using the default behavior for the operators, and then dive a bit deeper with a second example that shows how to use at least one of the overloads.

The Count and LongCount Operators

The Count and LongCount operators return the number of elements in a sequence. The Count operator can find this number quickly by simply asking objects such as List<T> that support the ICollection<T> interface for the count. If that service is not available, then LINQ iterates over the items in a list to get the count.  The LongCount operator provides the same basic functionality, but allows you to work with an Int64. A simple example of using the Count operator is shown in Listing 1.

Listing 1: A simple example of using the Count and LongCount operators.

 public void ShowCount()
{
    var list = Enumerable.Range(5, 12);
    Console.WriteLine(list.Count());
}

The overloads for Count and LongCount allow you to pass in a lambda expression that performs custom calculations from which LINQ can derive the count for a sequence. For instance, you can write code that returns the number of even numbers in a collection:

 var list = Enumerable.Range(1, 25);
 Console.WriteLine("Total Count: {0}, Count the even numbers: {1}",
    list.Count(), list.Count(n => n % 2 == 0));         

Our list consists of the numbers between 1 and 25. We call count once with the first version of the Count operator and get back the number 25.

The second overload of the Count operator takes a simple predicate. The declaration looks like this:

 public static int Count<TSource>(this IEnumerable<TSource> source,
   Func<TSource, bool> predicate);

The predicate takes an integer and returns a bool specifying whether or not a particular value from the list passes a test. In our case, we simple ask whether or not the number is even. This computation will return the values 2, 4, 6 and so on up to 24, for a total of 12 elements.

The Min and Max Operators

The Min and Max operators are equally simple. Listings 2 and 3 show how it works. The first shows the behavior of the first overload of Min and Max, the second shows how to one of the other overloads to pose slightly more complex questions.

Listing 2: A simple example of using the Min and Max operators to determine and highest and lowest values in a sequence.

 public void ShowMinMax()
{
    var list = Enumerable.Range(6, 10);
      Console.WriteLine("Min: {0}, Max: {1}", list.Min(), list.Max());
}

Our list consists of the number 6 through 15, so the code writes out the values 6 and 15 to the console.The C# source that implements Min and Max use the IComparable<T> or IComparable interfaces to perform the calculations. If you pass in a null argument you will get an ArgumentNullException.

For the more complex examples, I'm going to need a few rows of simple data, which I provide in Listing 3.

Listing 3: The following Item class and the GetItems method are used by most of the examples in this section of the text.

 class Item
{
    public int Width { get; set; }
    public int Length { get; set; }

    public override string ToString()
    {
        return string.Format("Width: {0}, Length: {1}", Width, Length);
    }
}

private List<Item> GetItems()
{
    return new List<Item> 
    { 
       new Item { Length = 0, Width = 5 },
       new Item { Length = 1, Width = 6 },
       new Item { Length = 2, Width = 7 },
       new Item { Length = 3, Width = 8 },
       new Item { Length = 4, Width = 9 }
    };
}

There is no simple way to know maximum or minimum values from a list of Items. To find the largest Item do you choose the element with the greatest Length, the greatest Width, or some other value? To solve this problem the C# teams provided us with an overload of the Min and Max operators that take a delegate that we can use to select the proper value for the comparison:

public static int Max<TSource>(this IEnumerable<TSource> source, Func<TSource, int> selector);

Like nearly all the LINQ to Objects operators, Max is implemented as an extension method for the class IEnumerable<T>. It takes an extremely simple lambda that is passed an element from the enumeration and returns an integer. To see how this works, take a look at Listing 4.

Listing 4: A somewhat more complex use of Min and Max, demonstrating how to get minimum and maximum values for complex types with multiple fields.

 List<Item> items = GetItems();
ShowList(items);
Console.WriteLine("MinLength: {0}, MaxLength: {1}", 
   items.Min(l => l.Length ), items.Max(l => l.Length));

As you can see, Min and Max both take a very simple delegate, which is implemented here as a lambda.

The lambda that is passed to Min looks like this: l => l.Length. This is lambda is so simple that it can be a bit confusing to people who are new to LINQ. Let's take one moment to be sure we understand what is happening.

We know that this LINQ operator must iterate over the sequence passed in to it, and we can assume that it passes each item it finds to the selector delegate. It then tests the result returned from selector, to see if it is the largest value returned. Without peeking at the real source code, it seems that Max might do something like the code in listing 5.

Listing 5: This method, which I created, mimics what occurs in the real Max method that ships with the C# 3.0 release.

 public static class MyExtensions
{
    public static int Max<TSource>(this IEnumerable<TSource> source
      Func<TSource, int> selector)
    {
        int largest = int.MinValue;
        foreach (var item in source)
        {
            int nextItem = selector(item);
            if (nextItem > largest)
            {
                largest = nextItem;
            }
        }
        return largest;
    }

Assuming that we are working with a collection of Items, then selector, were it implemented as a standard method, would have to look something like this:

 public int selector(Item item)
{
    return item.Length;
}

This method is semantically identical to the delegate we used in listing x: l => l.Length. It is very simple code that tells us which part of the Item class we are going to use to determine our max value.

It's all so simple that one feels a little like a character in Edgar Allan Poe's "The Purloined Letter:" the answer was hidden in plain sight. Once again we see that the biggest impediment to learning LINQ is the fear that it might be complicated. In practice, it is almost startlingly simple.

The Average Operator

Once one understands the pattern shown in our examination of the Min and Max operators, we find that it can be easily applied to most of the other Aggregate operators. Let’s look at the Average operator, which returns the average value from an enumeration.

For instance, one can find the average for a range of numbers like this:

 var list = Enumerable.Range(0, 5);
  Console.WriteLine("Average: {0}", list.Average());

When run, this code tells us that the average of the numbers 0, 1, 2, 3, 4 is the value 2.

When working with a collection of Items, we face the same problem we had with Min and Max: How does one discover the average value for list of Items that define two properties called Length and Width? The answer, of course, is that proceed just as we did with Min and Max operators:

 List<Item> items = GetItems();
 double averageLength = items.Average(l => l.Length);
double averageWidth = items.Average(w => w.Width);
double averageValue = items.Average(v => v.Length + v.Width);
Console.WriteLine("AverageLength: {0}, AverageWidth: {1} AverageValue: {2}", averageLength, averageWidth, averageValue);

Again, we pass in very simple lambdas such as l => l.Length + l.Width or w => w.Width. Somewhere in the background code similar to what you see in the custom implementation for the Max operator found in listing 5.X. The code must iterate over the list, passing in each item to our lambda, which defines the value we want the Average operator to use in its calculations:

AverageLength: 2, AverageWidth: 7 AverageValue: 9

The Sum Operator

The Sum operator tallies the values in an enumeration. Consider the following simple example:

 var list = Enumerable.Range(5, 3);
Console.WriteLine("List sum = {0}", list.Sum());

Our list consists of the numbers 5, 6 and 7. The Sum operator adds them together, producing the value 18.

working with a list of Items, the Sum operator faces the same problem we saw with the Min, Max and Average operators. It should come as no surprise that the solution is nearly identical:

 var items = GetItems(); 
Console.WriteLine("Sum the lengths of the items: {0}", items.Sum(l => l.Length));

Here is the same pattern you saw with the Average, Min and Max operators: we pass in a simple lambda to help the Sum method know which part of an Item it should use as the operand when performing its simple addition. The result printed to the console is the value 10. If only the rest of our lives were quite this simple!

The Aggregate Operator

The Aggregate operator follows in the footsteps of the Sum operator, but it provides us with a few more options. Rather than taking a simple delegate like the other operators in this series, it asks for one similar to the lambda we worked with in a previous post:

 public static T Aggregate<T>(this IEnumerable<T> source, Func<T, T, T> func);

We know what do to with delegates that looks like this. We could, for instance, create one that adds up a range of numbers:

 var list = Enumerable.Range(5, 3);
Console.WriteLine("Aggregation: {0}", list.Aggregate((a, b) => (a + b)));

The aggregate operator gets passed the numbers 5, 6 and 7. The first time the lambda is called it gets passed 5 and 6, and adds them together to produce 11. The next time it is called it is passed the accumulated result of the previous calculation plus the next number in the series: (11 + 7) which yields 18. This is the same result we saw for the Sum operator in the previous section. This overload of the Aggregate operator is indeed very similar to the Sum operator, though it is more flexible, in that you can easily perform multiplication, division, subtraction and other operations instead of simple addition. For instance, this code performs multiplication, yielding the value 210:

list.Aggregate((a, b) => (a * b))

Before pushing on, I should backtrack a little and discuss two simple points that are often brought up when people talk about this first version of the Aggregate operator. If it is passed a list with one item, it returns that item. If it is passed a list with 0 items, it throws an InvalidOperationException.

A second overload of the Aggregate operator allows you to seed the process with an accumulator:

 public static TAccumulate Aggregate<TSource, TAccumulate>(
    this IEnumerable<TSource> source, TAccumulate seed, 
    Func<TAccumulate, TSource, TAccumulate> func);

This is essentially the same operator as shown in the previous example, but now you can decide the starting point for the value that will be accumulated:

 Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b)));

If we pass in a list with one item in it, say the number five, then the first time the lambda is called it would be passed the seed plus the sole item in the list:

(0 + 5)

The result, of course, is the number 5.

Suppose we pass in an accumulator of 0 plus the numbers 5, 6, 7.

var list = Enumerable.Range(5, 3);

Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b)));

In this case we would step through the following sequence:

 0 + 5 = 5
5 + 6 = 11
11 + 7 = 18.

Again, we are doing essentially what we did with the Sum operator.

If you pass in a different seed, then you get a different result:

 Console.WriteLine("Aggregation: {0}", list.Aggregate(3, (a, b) => (a + b)));

With a seed of 3, we get:

 3 + 5 = 8
8 + 6 = 14
14 + 7 = 21

As mentioned earlier, the Aggregate operator allows us to perform not just addition, but multiplication, division or various other binary mathematical operations:

 Console.WriteLine("Aggregation: {0}", list.Aggregate(1, (a, b) => (a * b)));

In this case the series looks like this:

 1 * 5 = 5
5 * 6 = 30
30 * 7 = 210

Note that I passed in an accumulator equal to 1, so that we did not end up with the following series of operations:

 0 * 5 = 0
0 * 6 = 0
0 * 7 = 0

In what I sometimes suspect might have been an excess of good spirits, the team added one final overload to the Aggregate operator:

 public static TResult Aggregate<TSource, TAccumulate, TResult>(
                this IEnumerable<TSource> source, TAccumulate seed,
                Func<TAccumulate, TSource, TAccumulate> func,
                Func<TAccumulate, TResult> resultSelector);

This overload is nearly identical to the previous overload, but you are given one more, very simple, delegate that you can use to transform the result of your aggregation. For instance, consider this use of the Aggregate operator:

 Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b),
    (a) => (string.Format("{0:C}", a))));

Please notice that the first two-thirds of this call mirror what we did earlier, and only the third parameter is new.

Suppose we pass in a sequence with the values 5, 6 and 7. As we've already seen, the process will begin by performing the following series of operations:

 0 + 5 = 5
5 + 6 = 11
11 + 7 = 18

Once we have our result of 18, this number is passed to the last lambda in our call. It uses the string's Format method to transform it into a string in currency format:

$18.00

Like nearly everything in LINQ, this seems terribly complicated at first only to end up being reasonably simple. It is these kinds of simple operations, however, which provide us with the building blocks out of which we can safely create complex programs. This is what we mean when we apply the word elegant to a technology.

Note: All the samples shown in this post are found in the AggregateOperators program found on Code Gallery LINQ Farm.

kick it on DotNetKicks.com

Comments

  • Anonymous
    July 24, 2008
    You've been kicked (a good thing) - Trackback from DotNetKicks.com

  • Anonymous
    July 25, 2008
    Would it be possible to clarify the semantics of ICollection<T>.Count for collections with more than int.MaxValue elements?  If we required that ICollection<T>.Count throw an InvalidOperationException or something if the collection size was too large, then the Enumerate.LongCount() extension method could rely on ICollection<T>.Count for collections with fewer than int.MaxValue elements, e.g.: public static long LongCount<T> (this IEnumerable<T> source) {  ICollection<T> c = source as ICollection<T>;  if (c != null) {    try { return c.Count; }    catch { /* ignore; manually count elements */ }  }  long n = 0;  foreach (var e in source) ++n;  return n; } (Note: the above misses a potential optimization for when source is a T[], in which Array.LongLength can be used directly; it is merely provided for discussion purposes.) Not only could this vastly increase the performance of LongCount() in the majority of cases (how often do you have collections with more than int.MaxValue elements?), but it's already possible now for collection types to have more than int.MaxValue items -- LinkedList<T> should be able to store as many items as memory permits (which is quite a bit on 64-bit platforms), yet it also provides ICollection<T>.Count.  What should ICollection<T>.Count do when it has more than int.MaxValue items? I can't speak for .NET here, but I know that Mono's LinkedList<T> source will in fact wrap around if you add more int.MaxValue+1 items, so ICollection<T>.Count will return int.MinValue, which can't be good/sensible/sane for any client code.  (Even more bizarrely, if you add uint.MaxValue+1 items, .Count will return 0 -- the joys of integer math.)

  • Anonymous
    July 30, 2008
    Charlie, this is a bit off topic, but I saw that you used Enumerable.Range and it reminded me of my first encounter with this and how frustrated I was trying to get the example from the LINQ 101 samples to work because they use the old Sequence.Range instead of Enumerable.Range.  Can you ask someone to fix it because it is still out of date... Thanks!  

  • Anonymous
    August 01, 2008
    Nice article.  Great to see aggregate's explained so clearly and with good examples.

  • Anonymous
    October 05, 2008
    I demonstrate the usefulness of the nifty "fold" operator in Scala and ruminate on functional programming support in blue collar languages.

  • Anonymous
    October 29, 2009
    Given a list of items, you showed how to get a min value with LE as follows. items.Min(l => l.Length). This is good so far. But how do we use LINQ and LE to get THE ITEM with min length and not the value of min length as you showed? Basically how do you fill in the question marks here? Item lItemWithMinLength = items.??? Thanks! I'd appreciate it your answer on this one.

  • Anonymous
    November 11, 2009
    Try this: items.Aggregate((a,b) => a.Length < b.Length? a : b);