Jaa


Removing elements from a Dictionary

The topic of removing elements from a dictionary came up recently on an internal mailing list.  Someone was iterating through all the elements of a collection and wanting to remove some of them. The problem with that is that modifying the collection underlying the enumerator throws.  He had the following pseudo code:

 void RemoveSomeElements(Dictionary<string, int> dict){    foreach (var e in dict)    {        if (e.Value < 35)        {            // Remove e!        }    }}

Several people replied that he could just create a list of elements to be removed and then iterate that afterwards:

 var keysToRemove = new List<string>();foreach (var e in dict){    if (e.Value < 35)    {        keysToRemove.Add(e.Key);    }}foreach (var key in keysToRemove){    dict.Remove(key);}

That works, but it’s pretty verbose.  Someone suggested a more functional way to do it that’s pretty nice:

 dict.ToList().Where(pair => pair.Value < 35).ToList().ForEach(pair => dict.Remove(pair.Key));

At this point I decided to show off, and make it more efficient by using an “Apply” extension method instead of the .ForEach method that shipped on List<T> in .Net 2.0.  I came up with this:

 dict.Where(pair => pair.Value < 35).Apply(pair => dict.Remove(pair.Key)).Apply();

 

Along with the code to Apply:

 public static class ApplyExtensions{    public static IEnumerable<T> Apply<T>(this IEnumerable<T> source, Action<T> action)    {        Argument.NotNull(source, () => () => source);        Argument.NotNull(action, () => () => action);        return ApplyInternal(source, action);    }    internal static IEnumerable<T> ApplyInternal<T>(IEnumerable<T> source, Action<T> action)    {        foreach (var e in source)        {            action(e);            yield return e;        }    }    public static void Apply<T>(this IEnumerable<T> source)    {        foreach (var e in source)        {            // do nothing, just make sure the elements are enumerated.        }    }}

Here I thought I was doing right by C# 3.0, and showing people how to use lazily chained iterators.  Unfortunately, this code drops us right back to the original problem!  Because of lazy iteration, we’re actually still iterating the dictionary when we try to remove the elements, and so we invalidate the first iterator in our chain. 

It’s a simple fix, but still a reminder that the semantics of delayed execution can be tricky to reason about sometimes:

 dict.Where(pair => pair.Value < 35).ToArray().Apply(pair => dict.Remove(pair.Key)).Apply();

The “ToArray” call finishes the iteration of the dictionary, but uses less space than the original “.ForEach” solution, since we only realize a collection of the elements to be removed, instead of all of the elements in the dictionary.

Comments

  • Anonymous
    October 28, 2008
    You talked about naming the no-action 'Apply()' overload 'Realize()'.  I think a different name is a good idea, because it really does something quite different than the other 'Apply()' What if you write it like this:    public static IEnumerable<T> Apply<T>(this IEnumerable<T> source)    {        return source.ToList();    } And then replace your 'ToArray()' call with 'Realize()'? I think this makes your code a little closer to intent, a little further from mechanism.

  • Anonymous
    October 28, 2008
    Yes, realize works there, and I do like the point about intent over mechanism. I chose the no arg "Apply" in deference to Jon Pryor (http://www.jprl.com/Blog/), who I've discussed the "Apply" method with a bunch of times.

  • Anonymous
    October 29, 2008
    And how about: using System; using System.Collections.Generic; using System.Linq; static class Program {    static void Main(string[] args)    {        Dictionary<string, int> d = new Dictionary<string, int>()            {                {"one", 1},                {"two", 2}            };        d.Where(pair => pair.Value < 2).ToArray().ForEach(pair => d.Remove(pair.Key));    }    static void ForEach<T>(this IEnumerable<T> list, Action<T> action)    {        foreach (var item in list)        {            action(item);        }    } }

  • Anonymous
    October 29, 2008
           public static Dictionary<Tkey, Tvalue> DeleteWhere<Tkey, Tvalue>(this Dictionary<Tkey, Tvalue> collection, Func<Tkey, Tvalue, bool> shouldDelete)        {            foreach (var keyValuePair in new Dictionary<Tkey, Tvalue>(collection))            {                if (shouldDelete(keyValuePair.Key, keyValuePair.Value))                    collection.Remove(keyValuePair.Key);            }            return collection;        }

  • Anonymous
    October 30, 2008
    The comment has been removed

  • Anonymous
    October 30, 2008
    Rik, Your solution also works, but the price you pay is the space and time overhead of creating a completly new dictionary that contains all of the original elements.  In a large collection, you may not want to pay that price, assuming that the average case is that you're only actually removing half of them.  You also don't need the second collection to actually be a dictionary, since you aren't making any use of the lookup functionality; you're just enumerating the contents.

  • Anonymous
    October 30, 2008
    Kevin, did you measure your ToArray() solution vs. my solution using a copy of the dictionary? I haven't profiled for memory usage, but a rather unscientific test using System.Diagnostics.Stopwatch shows the dictionary copy solution is clearly faster. I'd be interested to see what happens with RAM, but I'll have to wait until tomorrow to use a proper profiler.

  • Anonymous
    October 30, 2008
    Uh-oh! You caught me making perf comments based on pure reasoning instead of empirical evidence. I didn't actually measure.  I'd love to hear about the results of your measurements though...