Jaa


The GroupAdjacent Extension Method

[Blog Map]  This blog is inactive.  New blog: EricWhite.com/blog

This is a bit of a geeky post for the LINQ and LINQ to XML folks.

This post introduces a GroupAdjacent generic extension method that groups elements in a collection with adjacent elements based on a common key.  For example, grouping the following array creates six groups (not 3, as with GroupBy): 

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };

 

GroupAdjacent groups them like this:                   In contrast, the standard GroupBy extension
                                                       method groups them like this:             

Group 1                                                Group 1
1                                                      1     
      
Group 0                                                Group 0
0                                                      0     
0                                                      0     
                                                       0     
Group 2                                                0     
2                                                      0     
                                                       0     
Group 0                                                0     
0                                                      0     
0      
0                                                      Group 2
                                                       2     
Group 2                                                2     
2

Group 0
0
0
0

Let's say that the above array represents paragraphs in a document.  If the value is > 0, the paragraphs are a heading of the specified level.  The zeros represent paragraphs styled as normal.  We want to do some processing on the collection of paragraphs in such a way that we process all adjacent paragraphs of the same style in a single query.  We could group them, and execute a query on each group, but as you saw above, the standard GroupBy extension method will group all of the 0's together and all of the 2's together.

As with the GroupBy extension method, we use a key selector function to get the key for each element in the source collection.  GroupAdjacent should have the same set of overloads as GroupBy.  When I get time, I'll code them and post them.

Using the identity function for the key selector, the following code shows the use of GroupAdjacent:

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };
var groups = ia.GroupAdjacent(i => i);
foreach (var g in groups)
{
Console.WriteLine("Group {0}", g.Key);
foreach (var i in g)
Console.WriteLine(i);
Console.WriteLine();
}

This produces the output that you saw above.

Implementation Notes

This method is lazy.  Until the results are iterated, the source is not iterated.  However, for each group, GroupAdjacent creates a List<T>, and populates it with the group.  As it turns out, this is the correct way to do it.  The entire source collection must be iterated, and if we were to not iterate through one of the groups, then iteration would not work.  Also, we MUST create the List<T> so that each group can be iterated multiple times.  Therefore, I decided to populate the List<T> for each group.  Iteration through the group then iterates through the list.

The following code is valid and should give the expected results:

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };
var groups = ia.GroupAdjacent(i => i);
foreach (var g in groups)
{
Console.WriteLine("Group {0}", g.Key);
foreach (var i in g)
Console.WriteLine(i);
// it is perfectly valid to iterate through the group more than once.
foreach (var i in g)
Console.WriteLine(i);
Console.WriteLine();
}

If any C# experts out there know a better way to implement this, I'd be happy to hear it.

Implementation

This approach was suggested by one of the LINQ architects a year or two ago.  I finally got around to it.  Thanks for the idea.

Following is the implementation of GroupAdjacent:

using System;
using System.Collections.Generic;
using System.Linq;
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
public static class LocalExtensions
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
}
class Program
{
static void Main(string[] args)
{
int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };
var groups = ia.GroupAdjacent(i => i);
foreach (var g in groups)
{
Console.WriteLine("Group {0}", g.Key);
foreach (var i in g)
Console.WriteLine(i);
Console.WriteLine();
}
}
}

Comments

  • Anonymous
    April 22, 2008
    Hi Eric, Interesting idea, I can think of several uses already. I had a play with the GroupAdjacent method as shown below. It does require a boolean assignment each time through the loop, but I think it reads easier. public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(        this IEnumerable<TSource> source,        Func<TSource, TKey> keySelector)    {        TKey lastKey = default(TKey);        bool isFirst = true;        List<TSource> list = new List<TSource>();        foreach (TSource s in source)        {            TKey currentKey = keySelector(s);            if (!isFirst && !currentKey.Equals(lastKey))            {                yield return new GroupOfAdjacent<TSource, TKey>(list, currentKey);                list = new List<TSource>();            }            list.Add(s);            lastKey = currentKey;            isFirst = false;        }        if (!isFirst)            yield return new GroupOfAdjacent<TSource, TKey>(list, last); }

  • Anonymous
    January 04, 2012
    This came up on stackoverflow (stackoverflow.com/.../23354) - just as a small note, it will struggle if the keySelector returns null. There is an easy fix, by hoisting EqualityComparer<TKey>.Default, which will automatically handle IEquatable<T>, Nullable<T>, and null references.

  • Anonymous
    April 01, 2012
    Thanks for this, I copied your method gist.github.com/2277055

  • Anonymous
    April 29, 2014
    Thank you so much it was very helpful for me