Udostępnij za pośrednictwem


Iterator Blocks, Part Two: Why no ref or out parameters?

A long and detailed discussion of how exactly we implement iterator blocks would take me quite a while, and would duplicate work that has been done well by others already. I encourage you to start with Raymond’s series, which is a pretty gentle introduction: part 1, part 2, part 3. If you want a more detailed look at how this particular sausage is made, Jon’s article is quite in-depth.

To make a long story short, we implement iterators by:

  • Spitting a class that implements the relevant interfaces.
  • Hoisting locals of the iterator block to become fields of the class.
  • Rewriting the iterator block as the “MoveNext” method of the class.
  • Tracking the point where the last yield happened in the MoveNext method, and branching to that point directly using a switched “goto” when MoveNext is called.
  • Moving the logic in “finally” blocks (or equivalents, like the auto-generated finally blocks associated with lock statements and using statements) into the Dispose method, to ensure that stuff gets cleaned up when the iteration is completed or abandoned.

This is not the only implementation strategy we could have pursued. As I said last time, we could have implemented full-blown continuations and built the whole thing out of them; any conceivable control flow can be built out of continuations. But getting continuations right requires a lot of support from the framework, and would be very expensive. We can get by with less, so we do.

Similarly, we could have built this out of “coroutine fibers”. A fiber is like a thread in that it has its own stack, but the code on the fiber itself decides when the fiber is done running. Again, this is a strategy – implement coroutines – that I mentioned last time which we rejected as too costly and difficult to get right in the CLR. Again, we can get by with less, so we do.

But this choice of a feasible, cost-effective implementation strategy then drives restrictions back into the design space; we have to lose some generality in order to make this strategy work.

The first thing that we lose is the ability to have iterator blocks be in methods which take ref or out parameters. In order to preserve the values of local variables and formal parameters across calls to MoveNext, we hoist the locals and parameters into fields of a compiler-generated class.

As I discussed earlier, the CLR designers had an implementation goal of being able to take advantage of the performance characteristics of the system stack. What happens when you store a reference in a field? The reference could be to something on the stack, but the field could live longer than the thing on the stack. The designers of the CLR were faced with four choices:

  • Build a system which is as fragile and horrid as unmanaged C code, a language in which accidentally storing a reference to something with too-short lifetime is a frequent source of awful crashing bugs and data corruptions.
  • Disallow (*) taking references to locals; any local taken as a reference must be hoisted to become a field on a garbage-collected reference type. Safe, but slow.
  • Disallow storing references in fields.
  • Invent something else that solves this problem.

The CLR designers chose the third option. Which means that our choice of hoisting parameters to fields immediately runs into a problem; we have no safe way of storing references to other variables in a field. The reference could be to something on the stack, but the iterator could live longer than the variable that is on the stack. A confluence of implementation choices has now driven an unfortunate and seemingly arbitrary restriction into the design. We lose a bit of generality in order to gain simplicity of implementation and higher performance in common cases.

Now think about that decision from the perspective of the larger goal of the feature. Remember, the feature is not “make any possible method into an iterator block”. The feature is “make it easier to write code that enumerates the members of a collection”. An “out” or “ref” parameter exists solely to enable a method to mutate an outside variable and therefore pass back information to the caller when the method terminates.

Why would you want to do that in a method whose sole purpose is to produce the sequence of values stored in a collection? What would the meaning of mutations to the variable even be if the iterator block mutates the variable multiple times as the iterator is running? Normally that kind of thing is not visible (modulo goofy multithreading scenarios) until the method is done, but an iterator block returns control to its caller many times before its done.

This seems like a scenario which is uncommon, strange and hard to implement. And furthermore, there are ways to achieve mutation of outer variables during iteration without passing in a ref to a variable; you could pass in an Action delegate. Instead of this illegal program:

IEnumerable<int> Frob(out string s)
{
yield return 1;
s = "goodbye";
}
void Grob()
{
string s = "hello";
foreach(int i in Frob(out s)) …
}

you can always do the equivalent thing with this legal program:

IEnumerable<int> Frob(Action<string> setter)
{
yield return 1;
setter("goodbye");
}
void Grob()
{
string s = "hello";
foreach(int i in Frob(x=>{s=x;})) …
}

Since making the more general feature work given our implementation constraints is (1) hard or impossible, (2) clearly a corner case, not a mainline scenario, and (3) has a possible workaround, it’s a no-brainer to put a restriction on the design and disallow ref and out parameters.

Next time, why no yield in a finally block?

***********

(*) Or, weaker, make it allowed but unverifiable. That is pretty much the same thing from the compiler implementers perspective; we try to never generate unverifiable code unless it is specifically marked “unsafe”.

Comments

  • Anonymous
    July 13, 2009
    What I really want to know is, why no "yield foreach"? The paper on design on CLR iterators explains its advantages and how to implement it. What are downsides to implementing it? The downside is that we'd have to cut something better. We have a list of a couple hundred possible new language features for any given version of C#. We have the design, implementation, testing, documentation and management resources available to actually implement maybe four or five of them in any one version. "yield foreach" is an awesome feature that I would love to do; it is certainly in the top fifty, but has never made it into the top ten. If it ever makes it into the top five, we'll probably implement it. -- Eric

  • Anonymous
    July 13, 2009
    "if it ever makes it into the top five, we'll probably implement it. -- Eric" This is one of the reasons that, despite the very long incubation period associated with f# I'm glad it had that time to be used in the real world with everyone using it knowing it was, in effect, beta and subject to change as feedback allowed it to grow/change. Do you think things would have been different with c# had it had a more extensive beta period()? () obviously this is hypothetical, I don't believe it would be as successful had it had anything like as long a beta as f#.

  • Anonymous
    July 14, 2009
    > The designers of the CLR were faced with three choices I think you missed a couple of other choices that are well established options in this area, which appears closely related to the "downward funargs problem". 4) detect when the lifetime of the object being referenced has expired, and prevent accesses after that point. 5) lift the object to the heap when its current storage location is expiring. Good point. I normally end lists of options with "do something I haven't thought of". I'll update the text. -- Eric

  • Anonymous
    July 15, 2009
    Perhaps you already aluded to this, but iterator blocks make it possible to implement continuations in C# programs without any additional language features. In, fact, I've done so on several occasions. The syntax isn't always the clearest, but it works quite well. The basic pattern I've used is to create an iterator block that returns a Func<> of some sort that encapsulates the individual sequences of computations that represent the continuation. Each is returned by the iterator block and then invoked by the caller. It's by the way, not dissimilar from the CCR implementation in the MS robotics library. Here's a simple example of a factorial using CPS programming: public class Continuations {    public static void Main(string[] args)    {        var i = 0;        var val = 0;        // output each step in the Factorial function until you compute the final value        foreach( var facStep in FactorialCPS.Factorial( 8 ) )            Console.WriteLine( "Step {0} = {1}", ++i, val = facStep() );        Console.WriteLine( "Factorial of 8 is {0}", val );    } } public static class FactorialCPS {    public static IEnumerable<Func<int>> Factorial( int n )    {        int q = 1, f = 1;        while( q <= n )            yield return () => f *= q++;  // Note: uses closures to capture state    } }

  • Anonymous
    July 15, 2009
    Out of interest, what are the current "top 5" features that haven't yet made it into the language yet? Yeah, I wish I knew too. (No joke!) It is way too early to speculate on that, and we don't know ourselves. The last time I mentioned an idea that we were kicking around in the hallway, I got 50+ comments criticizing the fact that we were even considering it. I don't want to get 250+ negative comments by discussing a possible feature set for an unannounced, hypothetical product. Talking about feature sets of hypothetical future versions would be extremely irresponsible at this point in the process. We haven't even shipped C# 4.0 yet. -- Eric

  • Anonymous
    July 16, 2009
    If I may ask, was anything like generalized interfaces from the paper "JavaGI: Generalized Interfaces for Java" (Stefan Wehr, Ralf Lämmel, Peter Thiemann) ever on the list?

  • Anonymous
    July 20, 2009
    Great, I also read Jon’s article, learn a lot from it

  • Anonymous
    June 17, 2010
    I'll tell you why we need it, recursive inheritance. Consider an XML document where element may (optionall) inherit from their parents and you need to parse this with a streaming approach (not DOM)....you'll need to keep track of the most specific values that might apply to a child as you recurse through the document each time you find something more specific you overwrite what you previously found until you find the leaf you're looking for and it'll have the most specific inherited properties nicely ready to yield return...that's why!