Поделиться через


MapReduce explained (using LINQ) :-)

If you spend some time to understand LINQ then you will probably like this explanation what MapReduce does.

DryadLINQ is a programming environment that does automatic parallelization and is based on LINQ. To describe the power of the framework the Microsoft Research Project page has some sample code that shows how to implement MapReduce

 public static IQueryable<TResult>
MapReduce<TSource, TMap, TKey, TResult>(this IQueryable<TSource> source,
                                        Expression<Func<TSource, TMap>> mapper,
                                        Expression<Func<TMap, TKey>> keySelector,
                                        Expression<Func<TKey, IEnumerable<TMap>, TResult>> reducer)
{
    return source.SelectMany(mapper).GroupBy(keySelector).Select(reducer);
}

Isn’t that nice. The full power of functional programming :-) . And suddenly MapReduce makes sense.

Of course there is some heavy lifting going on internally. It also shows that the DryadLINQ thing is more general than an implementation that does MapReduce only. I love .NET :-)