Internal vs. External Iterators (revisted)

A while back I posted about my issues in C# when I was trying to write out an extensible collections API.

From the high level perspective I wanted to provide useful concrete implementations of many structures that would solve most peoples needs, and I also wanted to include good abstract implementations of most behaviors so that it would be pretty easy for someone to come in and implement one of the collections interfaces without too much burden.

From the low level perspective something I struggled with was whether or not to base the collections on an external iterator pattern or an internal iterator pattern.

Recall that an internal iterator behaves like this:

 
///returns false when you iteration to stop
delegate void Action(object a);

interface IIteratable {
     void Iterate(Action p);
}

A collection implementing the internal iterator is just passed a delegate which it will then apply to all of its internal elements in order.

An external iterator behaves like this:

 
interface IEnumerator {
     object Current { get; }
     bool MoveNext();
}

interface IEnumerable {
     IEnumerator GetEnumerator();
}

Why would I want two different iteration methods supplied in the API? Well, it comes down to a balance between ease of implementation vs ease of use. Let's use the following simple case: I want to provide a sorted set of elements and I do so by using a binary tree. Now I have two of these sets and I want to test them for equality. Since they are both sorted sets then I know that they have no duplicate elements. So to test them for equality I just need to check their sizes and then iterate over them verifying that each has the same elements in the same order.

Say my set has the following definition for the nodes it uses in the tree

 
class BinaryNode {
       object value;
       BinaryNode; left;
       BinaryNode right;
} 

If I try to write the IIterator for that. It's pretty simple with the following code:

 
      void Iterate(Action a) {
             if (left != null) {
                   left.Iterate(a);
             }

             a(value);

             if (right != null) {
                   right.Iterate(a);
             }
      }

If I wanted to implement the IEnumerable version though it would be far more complex. (Just try writing it!). However, say I have written the IEnumerable version and I want to test two sets for equality, then it's incredible simple by just enumerating over the two IEnumerators returned from each set. Testing equality with the IIterator would be far more complex in that case. (Just try writing it!) :-)

So (in general) IEnumerable puts a far greater burden on the programmer but more power in th users hand, while IIterator puts a far greater burden on the consumer of the API while simplifying things for the API developer. So wouldn't it be great if you could have a solution that made it easy for the developer while being powerful enough for the consumer. That's what I attempted to do. However, it seems that in C# it's only easy to go in one way. i.e. i could write:

 
public abstract class AbstractCollection : ICollection {
       public void Iterate(Action action) {
             foreach(A element in this) {
                    action(element);
             }
       }

       public abstract IEnumerator GetEnumerator();
}

However, what if I wanted the reverse. What if I wanted to implement GetEnumerator on top of Iterate. I struggled with this for a while and ended up giving up on it since it didn't seem possible to do without some nasty threading hacks. (see the feedback from that post to see Steve Steiner actually go through and implement that idea).

I had not thought about this in a while when suddenly i got a great email from Neil pointing me to this article on fibers and coroutines in .Net. In the original posting I pointed out that what I was trying to do was fairly trivial in some functional language through the use of call-with-current-continuation (or some hackery with exceptions). However, I couldn't see a way to do that in C# given the (seeming) lack of low level control necessary to accomplish that.

However, the article shows that that isn't the case and that it is actually quite possible to accomplish the very task I was trying to do. Using the information from the article it seems like it should be possible to write the code similar to the following:

 
     //currently in the AbstractCollection code

     public IEnumerator GetEnumerator() {
            return new CollectionEnumerator(this);
     }

     //note the subclassing of Fiber
     class CollectionEnumerator : Fiber, IEnumerator {
            ICollection c;
            object current;

             CollectionEnumerator(ICollection c) {
                     this.c = c;
                     c.Iterate(delegate (object o) {
                             Yield(o);  
                             //Yield is inherited from Fiber.  It means: "return control to the outer fiber passing back the value 'o'.
                             //The state of this fiber (including registers/stack) is saved and control to is can be resumed in the future.
                             //When this fiber finishes (after yielding the last element), control will be passed back to the main
                             //fiber with the value 'null' returned"
                     });
             }

             public object Current { get { return current; } }
 
             public bool MoveNext() {
                    current = Resume();
                    return current != null;

                    //Resume is inherited from Fiber.  It means: "continue the fiber where it last left off.  When the fiber yields a 
                    //value keep it around in current.  When the fiber finishes, null will be returned and we know that this enumerator
                    //has no more elements"
             }
      }     

Note: it will actually take a little more work, and it would be nicer to clean things up to both work with generics and also deal with the fact that the Iterator might actually produce null as an actual value in the collection, not just something to indicate that iteration completed. However, the core work is pretty much there.

Note: this isn't really that different from what AT, Steve and I were discussing earlier (since it does involve interesting threading ideas), however the pointers provided by the site mean that it can be done much more simply without any extra time/memory tradeoffs and without having to write out all the synchronization logic. It also means that all the fiber logic (as well as saving thread state) is hidden out of the way from you. Another thing to notice is that because it is based on fibers when you run it you always stay within the current thread of execution, so you aren't taking an hit for switching an entire thread in/out. The implementation in the paper requires that one subclass a specialized Fiber class. I'm not a big fan of forcing subclassing as it can interfere with your own natural hierarchies. When I come back to work later next week I'm going to see if I can refactor the code so that you end up with a mixin that you can use to get this without requiring subclassing.

Edit: The code samples here have not been tested. I came up with them after reading the article on MSDN. I can't vouch for whether or not they will actually work and I will post updated (hopefully correct) code when i do implement this later on.

Comments

  • Anonymous
    June 06, 2004
    I would like to se a similar “visitor” implementation into the collection classes in the System.Collection and System.Collection.Generic namepsace.

  • Anonymous
    June 06, 2004
    Now that is sweet! I hadn't seen that article on MSDN Mag. with co-routines. I can just imagine the possibilities.

    Keep it up. Its great to see someone who so readily provides more tools for the toolbox.

  • Anonymous
    June 07, 2004
    Fredrik: I'll pass your comment on and see if there's an appropriate blog you can bring up your issues to.

  • Anonymous
    June 07, 2004
    Cyrus:

    Thanks!

    Don't you think it would be great if the Iterate method

    public abstract class AbstractCollection {
    public void Iterate(Action action) {
    foreach(A element in this) {
    action(element);
    }
    }

    public abstract IEnumerator GetEnumerator();
    }

    is added to the collection classes? It's often I iterate through collections, and it would be nice to have this kind of "Visitor" pattern in the collection classes.

  • Anonymous
    June 07, 2004
    The comment has been removed

  • Anonymous
    June 07, 2004
    Don't even think about doing this. I can't believe that MSDN published this article. Don't use Fibers.

    http://blogs.msdn.com/greggm/archive/2004/06/07/150298.aspx

  • Anonymous
    June 07, 2004
    I support the work of more iterators and better iterators for .net. I really like the way haskell does things with map, filter, fold etc. and second order programming.

    It always seems silly to me in C#/C++/Java to have to write my own foreach loop when I am pretty sure someone already has done that. Unfortunately, closures or lambda expressions or simulations thereof are hard in C#. C# 2.0 looks promising here.

    The yield statement of C# 2 also looks promising. They don't seem to use fibers or even continuations in their .net implementation. These would be nice, but my guess is that they don't work well on the stack architecture of the ecma runtime.

  • Anonymous
    June 08, 2004
    Python has had map, filter, fold, etc since its inception. Being an imperative language, for imperetive programmers, they rarely get used. Theres an ongoing debate in the Python community about dropping support for them, and for dropping support of lambda.

    One of the enabling features of python that tends to reduce the need for map, filter, fold etc is list comprehensions.

    Heres an example of some comprehnsions and their functional equivalent:

    [x2 for x in list] <==> map(lambda x: x2, list)
    [x for x in list if x&1] <==> filter(lambda x: x & 1, list)
    [x2 for x in list if x&1] <==> map(lambda x: x2, filter(lambda x: x & 1, list))

    As you can see, a list comprehension is much more readable than the equivalent functional form.

    Heres the equivalent in c#

    ArrayList res;
    foreach (int x in list) if (x&1 == 1) res.Add(x);

    Using c# in a functional mode:

    ArrayList res;
    Functional.Iterate(list, delegate(int x) { if (x&1 == 1) res.Add(x); })

    Again, I put it to you that the imperative form is more readable.

    Itd be nice to be able to use a yield statement inside anonymous delegates though. Youd have a nice-ish syntax for doing something like list comprehensions then.

    delegate() { foreach (int x = 0; x < 20; x++) yield return xx; }
    would result in an iterator.

    wonder what this should result in:
    delegate() { yeild return x
    x; }

    eesh



  • Anonymous
    June 09, 2004
    Damien: I disagree. I find the imperative form much more ungainly and difficult to read. It also forces the resultant list to be an ArrayList which takes the control out of the object itself.

    The list comprehensions are extremely nice. But I see them as syntactic sugar (not a bad thing) over the lambda functions. I woudl still want the nice atomic operations, but am glad that the nice language syntax is there to manipulate it in readable ways.

  • Anonymous
    June 11, 2005
    About : "Python has had map, filter, fold, etc since its inception. Being an imperative language, for imperetive programmers, they rarely get used. Theres an ongoing debate in the Python community about dropping support for them, and for dropping support of lambda.

    One of the enabling features of python that tends to reduce the need for map, filter, fold etc is list comprehensions.
    "

    Haskell has both named functions over collection domains such as map, filter ... and list comprehensions over list domains. List comprehensions are just an application of the very old mathematical form of set comprehensions used on list domains, which are not strictly sets. Haskell also has lambda expressions (nameless functions). Theoretically you could have a collection comprehension over any collection domain ( type). It is just a matter of writting the rules of logic that define the function over that collection domain and using your favourite bracket notation
    So take your pick of whichever style you prefer. They are all logically equivalent.

  • Anonymous
    November 04, 2005
    The comment has been removed

  • Anonymous
    May 31, 2009
    PingBack from http://patiochairsite.info/story.php?id=349

  • Anonymous
    June 17, 2009
    PingBack from http://pooltoysite.info/story.php?id=2328