Поделиться через


Iterator Blocks, Part Five: Push vs Pull

A while back I posted some commentary I did for the Scripting Summer Games where I noted that there is an isomorphism between "pull" sequences and "push" events. Normally you think of events as something that "calls you", pushing the event arguments at you. And normally you think of a sequence as something that you "pull from", asking it for the next value until you're done. But you can treat a stream of event firings as being a sequence of event argument objects. And similarly, you could implement a sequence iterator so that it called a method for every object in the collection. Heck, you could implement all of LINQ on this model if you really wanted.

The implementation of iterator blocks is clearly on the "pull" paradigm. It didn't have to be. We could have gone with a sort of "inversion of control" approach. "Pull" iterators have to simulate coroutines with a little state machine that knows how to get back to the state where the code left off. But we already have a mechanism for code to "get back to the state where it left off" -- that's what you do every time you call a method! You remember what you were doing, call the method, and then pick up where you left off. We could have done the same thing with iterators. We could say that this:

void Integers(int length, IObserver<int> observer)
{
for (int i = 0; i < length; ++i) yield return i;
}

is a syntactic sugar for

void Integers(int length, IObserver<int> observer)
{
for (int i = 0; i < length; ++i) observer.Yield(i);
observer.Break();
}

That is, we "raise an event" by calling the observer every time something yields, and raise another event when we're done.

This would be a much more trivial transformation to make than our current state machine approach, but since most people want sequence iteration on the "pull" model, we do the harder work to make that happen.

I said this had something to do with exception handling. What's the connection?

Did you notice something weird about how we handle finally blocks? Consider what happens with the "push model":

TextReader reader = File.Open(file);
try
{
blah blah blah yield the lines
}
finally
{
reader.Close();
}

If something in the "blah blah blah" is realized as a call to observer.Yield(line), what happens if the code consuming the result throws? Easy. It is just a method call like any other. The call stack unwinds, we find the finally, the file gets closed, everything is good.

Now suppose this is realized as the MoveNext of a "pull" iterator. What happens when the code that is consuming the result throws? If we're consuming the result then we've returned from the call to MoveNext! There is no "try", there is no "finally". And yet usually the finally gets executed anyways! If the code that is consuming the results throws, odds are good that it was in a foreach; when foreach throws, the enumerator is disposed, and when the enumerator is disposed, we figure out which "finallys" were around when we last left MoveNext, and we run a special method that does whatever was in the pending finally blocks.

This is pretty weird. What's going on here is that for finally blocks, "pull" iterators have the same semantics as "push" iterators. When you yield from a try containing a finally, it looks like the finally is still "on the stack" and runs when the consumer throws.

So what if the try block has a catch?

The original designers of C# 2.0 -- and remember, this is long before my time on the team -- had a huge debate over this. A debate which was repeated in miniature when I sent them all an email asking for the justification for this decision. There are three basic positions:

1) Do not allow yield returns in try blocks at all. (Or blocks that are sugars for try blocks, like using blocks.) Use some other mechanism other than "finally" to express the idea of "this is the code that you run when the caller shuts down the enumeration."

2) Allow yield returns in all try blocks.

3) Allow yield returns in try blocks that have finally blocks, but not if they have catch blocks.

The down sides of (1) are that you have to come up with some syntax for representing the "finally" logic that isn't actually a "finally", and that it becomes more difficult to iterate over stuff that requires cleanup, like iterating over the contents of a file. This makes the feature both confusing and weak; we should avoid this option if at all possible.

The down side of (2) is that the exception handling behaviour beomes deeply, weirdly inconsistent between finally blocks and catch blocks. If you have a yield in a try block with a finally, and the consumer of the iteration throws, then the finally runs, as though you were on the "push" model. But if you have a yield in a try block with a catch, and the consumer of the iteration throws, then the catch does not run, because its not on the call stack. Users have a reasonable expectation that finally and catch work more or less the same way when an exception happens; that is, that control will be consistently transferred to a matching catch or the finally.

The down side of (3) is that the rule seems arbitrary and weird -- until you read five unnecessarily prolix blog entries that explain what on earth the design team was thinking.

Obviously, they picked (3), and now you know why.

Next time, we'll finish up this series with a look at unsafe code. Now that you know all this stuff, seeing why there's no unsafe code allowed in an iterator block is pretty straightforward by comparison.

Comments

  • Anonymous
    July 23, 2009
    Any chance you could add an "afterword" to this series talking about why there's no way to say "I want this bit of code here (usually validation) to be executed when the iterator method is first called, not on the first call to MoveNext()"? Obviously if it's just a case of "the design team didn't realise how useful that would have been" then it's not much of a post, but maybe there are subtler reasons... Well, I agree that it would be useful. The question is whether it is useful enough to justify adding new syntax to the language, and I don't think it is. The design team already rejected adding a new syntax for "clean this thing up if the enumeration is disposed early", instead opting to go for our rather goofy "finally that's not on the stack still runs" strategy. The design team tries to only add new syntax if its absolutely necessary for the feature. This is a bit of a "gotcha", I know, but it's not that bad and the workaround is straightforward. Put it this way: if you know to use the new syntax to make some code "eager", then you also know how to do it "manually". If you don't know it, then you'll write the bug whether there's a syntax to fix it or not. So the syntax is unnecessary. -- Eric (Oh, and thanks for the shout out. I wondered why there was a comment left on that blog post tonight :)

  • Anonymous
    July 23, 2009
    Eric, Can you also provide some insight into why "yields" are not allowed inside an anonymous method or lambda expression Good question. I would love to have anonymous iterator blocks. It would be totally awesome to be able to build yourself a little sequence generator in-place that closed over local variables. The reason why not is straightforward: the benefits don't outweigh the costs. The awesomeness of making sequence generators in-place is actually pretty small in the grand scheme of things and nominal methods do the job well enough in most scenarios. So the benefits are not that compelling. The costs are large. Iterator rewriting is the most complicated transformation in the compiler, and anonymous method rewriting is the second most complicated. Anonymous methods can be inside other anonymous methods, and anonymous methods can be inside iterator blocks. Therefore, what we do is first we rewrite all anonymous methods so that they become methods of a closure class. This is the second-last thing the compiler does before emitting IL for a method. Once that step is done, the iterator rewriter can assume that there are no anonymous methods in the iterator block; they've all be rewritten already. Therefore the iterator rewriter can just concentrate on rewriting the iterator, without worrying that there might be an unrealized anonymous method in there. Also, iterator blocks never "nest", unlike anonymous methods. The iterator rewriter can assume that all iterator blocks are "top level". If anonymous methods are allowed to contain iterator blocks, then both those assumptions go out the window. You can have an iterator block that contains an anonymous method that contains an anonymous method that contains an iterator block that contains an anonymous method, and... yuck. Now we have to write a rewriting pass that can handle nested iterator blocks and nested anonymous methods at the same time, merging our two most complicated algorithms into one far more complicated algorithm. It would be really hard to design, implement, and test. We are smart enough to do so, I'm sure. We've got a smart team here. But we don't want to take on that large burden for a "nice to have but not necessary" feature. -- Eric

  • Anonymous
    July 23, 2009
    http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html Coincidence? Yep. -- Eric

  • Anonymous
    July 23, 2009
    One nice advantage of the pull model is that you can simply stop pulling when you have enough data.

  • Anonymous
    July 23, 2009
    Excerpt from .NET Framework class library documentation.


bool MoveNext(); Return Value Type: System.Boolean true if the enumerator was successfully advanced to the next element; false if the enumerator has passed the end of the collection. SHOUD BE CORRECTED: Return Value Type: System.Boolean true if the enumerator was successfully advanced to the next element; otherwise false.

First version of the specification requires that method MoveNext() somehow must represent whole iteration, effectively disallowing yields in various code blocks related to exceptions. While second version of the specification deals only with single step of the iteration, and effectively allows yields in any code block. I think second version is better, because anyway we do not have reliable means to tell how successfully we iterated through all elements in the sequence or even tell, by using iteration, is sequence finite or endless.

  • Anonymous
    July 24, 2009
    @Bart - and with the Push model, you can stop listening... this is partly how we did things like Take; once it has enough data, it unsubscribes itself. Note that you can't terminate the entire through, because one of the big plus points of a push model is that you can push data through multiple observers at the same time.

  • Anonymous
    July 24, 2009
    The comment has been removed

  • Anonymous
    July 24, 2009
    The comment has been removed

  • Anonymous
    July 25, 2009
    The comment has been removed

  • Anonymous
    July 26, 2009
    "The down side of (2) is that the exception handling behaviour beomes deeply, weirdly inconsistent between finally blocks and catch blocks. If you have a yield in a try block with a finally, and the consumer of the iteration throws, then the finally runs, as though you were on the "push" model. But if you have a yield in a try block with a catch, and the consumer of the iteration throws, then the catch does not run, because its not on the call stack. Users have a reasonable expectation that finally and catch work more or less the same way when an exception happens; that is, that control will be consistently transferred to a matching catch or the finally." To think this through, how about:    public interface IEnumerableExceptionHandler    {        void Handle(Exception e);    } A type that supports IEnumerable could optionally also support the above interface. The foreach statement would look at the static type of the loop collection, and if it supports IEnumerableExceptionHandler, it would generate something like:    foreach (var item in collection)    {        try        {             // loop body        }        catch (Exception e)        {            collection.Handle(e);        }    } Then the compiler-generated iterator class would implement IEnumerableExceptionHandler, and would store the exception, and then simulate a throw by executing the appropriate catch (and finally) on the next call to MoveNext. (I hate the fact this catches NullReferenceException, but never mind that - it's fixed in CLR 4 isn't it?) This would implement the "expected" behaviour, but now I look at it, I don't think that would be expected. I wouldn't personally expect exceptions in the loop body to magically wormhole their way to somewhere else. I think this is because catch and finally are very different things. Catch blocks run instead of something else, where finally blocks run in addition to something else. I'd expect yield return itself to never throw, once it has a value to yield. And I'd find it very useful to be able to catch exceptions from the code intermingled with yield return (especially the expression that provides the value to yield return). But it sounds like these debates were already had five years ago.

  • Anonymous
    July 27, 2009
    Great discussion. Just one question. How do you think it could hit on the overall performance of the process when you implement the "push" pattern ? I did some work and research on different event's oriented patterns and usually I've found that the use of event's has some important impact on performance.

  • Anonymous
    August 21, 2009
    The comment has been removed

  • Anonymous
    August 27, 2009
    The comment has been removed