Fun with Iterators and state machines

I recently was given a piece of C# code with statements like:

yield return value;

and

yield break;

This was the beginning of my descent into the loopy world of C# 2.0 iterators. It took me awhile to wrap my head around them, and when I tried to explain them to other team members I got looks of total confusion. (I admit it, some of us don't keep the C# 2.0 specification on the night stand. :-) If you're totally familiar with iterators, you can skip this blog entry. If not, plunge in with me. Also, I'm not trying to give a comprehensive overview of C# iterators. (For that, see the MSDN article) Rather, I'm focused on the cool compiler and run time magic that makes them possible.

<<Disclaimer: This code is based on the Whidbey Beta 1 version of C#, and various details may change before release.>>

Consider the following program:

//
// Iterator1.cs - Requires C# 2.0 or later
//

using System;
using System.Collections.Generic;

class SomeContainer
{
public IEnumerable < string > MyIterator()
{
yield return "Foo";

yield return "Bar";

yield return "Baz";
}
}

class foo
{
static void  Main()
{
SomeContainer container = new SomeContainer();

foreach( string s in container.MyIterator() )
Console.WriteLine( s );
}
}

Compiling the program ( "csc iterator1.cs" ) and running it gives these results:

Foo

Bar

Baz

If you try to mentally trace the flow of control, you might notice something odd. What the heck does a "yield return" statement do? Looking at the MyIterator method, it seems like there's two possibilities:

1) The CPU executes only part of MyIterator() at any given time. At points, control comes back to the method, and it executes another chunk.

or

2) The entire MyIterator method runs to completion, and all the possible return values are put into a temporary collection.

It's easy enough to test the first hypothesis. Let's add some WriteLine calls to MyIterator and see when they appear:

//
// Iterator2.cs - Requires C# 2.0 or later
//

using System;
using System.Collections.Generic;

class SomeContainer
{
public IEnumerable < string > MyIterator()
{
Console.WriteLine( "Entering MyIterator" );

yield return "Foo";
Console.WriteLine( "After Foo" );

yield return "Bar";
Console.WriteLine( "After Bar" );

yield return "Baz";
Console.WriteLine( "After Baz" );
}
}

class foo
{
static void  Main()
{
SomeContainer container = new SomeContainer();

foreach( string s in container.MyIterator() )
Console.WriteLine( s );
}
}

Compiling the new program ( "csc iterator2.cs" ) and running it gives these results:

Entering MyIterator

Foo

After Foo

Bar

After Bar

Baz

After Baz

Hmm... Based on these results, it seems that the execution path definitely bounces back and forth between the Main() method and the MyIterator() method. What the heck is going on here?

Since I work on the Visual Studio team, I could have asked a few questions, but some times it's more fun to figure things out on your own. Luckily, with a tool like ILDASM, it's easy enough to discern what's happening.

First, let's recompile with Iterator2.cs with optimizations ( "CSC /O+ iterator2.cs" ) and see what ILDasm shows us:

___[MOD] iterator2.exe
| M A N I F E S T
|___[CLS] SomeContainer
| | .class private auto ansi beforefieldinit
| |___[CLS] <MyIterator>d__0
| | | .class nested private auto ansi sealed beforefieldinit
| | | implements class [mscorlib]System.Collections.Generic.'IEnumerable`1'<string>
| | | implements [mscorlib]System.Collections.IEnumerable
| | | implements class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>
| | | implements [mscorlib]System.Collections.IEnumerator
| | | implements [mscorlib]System.IDisposable
| | | .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) ...
| | |___[FLD] <>1__state : private int32
| | |___[FLD] <>2__current : private string
| | |___[FLD] <>4__this : public class SomeContainer
| | |___[MET] .ctor : void(int32)
| | |___[MET] MoveNext : bool()
| | | System.Collections.Generic.IEnumerable<System.String>.GetEnumerator : class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>()
| | | System.Collections.Generic.IEnumerator<System.String>.get_Current : string()
| | |___[MET] System.Collections.IEnumerable.GetEnumerator : class [mscorlib]System.Collections.IEnumerator()
| | |___[MET] System.Collections.IEnumerator.Reset : void()
| | |___[MET] System.Collections.IEnumerator.get_Current : object()
| | |___[MET] System.IDisposable.Dispose : void()
| | |___[PTY] 'System.Collections.Generic.IEnumerator<System.String>.Current' : instance string()
| | |___[PTY] System.Collections.IEnumerator.Current : instance object()
| |
| |___[MET] .ctor : void()
| | MyIterator : class [mscorlib]System.Collections.Generic.'IEnumerable`1'<string>()
|
|___[CLS] foo
| | .class private auto ansi beforefieldinit
| |___[MET] .ctor : void()
| |___[STM] Main : void()
|

For our very simple SomeContainer class, the C# compiler sure generated a lot of methods and fields. Particularly are the variables <>1__state and <>2__current. They sound suspiciously like parts of a state machine. A state machine could explain how just portions of a method execute.

Let's dig a little deeper. Expand the MoveNext() method and examine the disassembly (which I've added a few extra lines to in order to increase readability):

.method private hidebysig newslot virtual final
instance bool MoveNext() cil managed
{
.override [mscorlib]System.Collections.IEnumerator::MoveNext
.override method instance bool class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>::MoveNext()
// Code size 164 (0xa4)
.maxstack 2
.locals init (int32 V_0)
IL_0000: ldarg.0
IL_0001: ldfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: switch (
IL_0022,
IL_0047,
IL_006c,
IL_0091)
IL_001d: br IL_00a2

IL_0022: ldarg.0
IL_0023: ldc.i4.m1
IL_0024: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0029: ldstr "Entering MyIterator"
IL_002e: call void [mscorlib]System.Console::WriteLine(string)
IL_0033: ldarg.0
IL_0034: ldstr "Foo"
IL_0039: stfld string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_003e: ldarg.0
IL_003f: ldc.i4.1
IL_0040: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0045: ldc.i4.1
IL_0046: ret

IL_0047: ldarg.0
IL_0048: ldc.i4.m1
IL_0049: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_004e: ldstr "After Foo"
IL_0053: call void [mscorlib]System.Console::WriteLine(string)
IL_0058: ldarg.0
IL_0059: ldstr "Bar"
IL_005e: stfld string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_0063: ldarg.0
IL_0064: ldc.i4.2
IL_0065: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_006a: ldc.i4.1
IL_006b: ret

IL_006c: ldarg.0
IL_006d: ldc.i4.m1
IL_006e: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0073: ldstr "After Bar"
IL_0078: call void [mscorlib]System.Console::WriteLine(string)
IL_007d: ldarg.0
IL_007e: ldstr "Baz"
IL_0083: stfld string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_0088: ldarg.0
IL_0089: ldc.i4.3
IL_008a: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_008f: ldc.i4.1
IL_0090: ret

IL_0091: ldarg.0
IL_0092: ldc.i4.m1
IL_0093: stfld int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0098: ldstr "After Baz"
IL_009d: call void [mscorlib]System.Console::WriteLine(string)
IL_00a2: ldc.i4.0
IL_00a3: ret
} // end of method '<MyIterator>d__0'::MoveNext

The listing breaks up into five distinct blocks. Some of the key highlights include:

· IL_0000 - IL_001D: Load the value of the <>1__state field, and branch to one of five different blocks in the code. This is consistent with what a state machine will do.

· IL_0022 - IL_0024: Store the value "-1" into <>1__state.

· IL_0029 - IL_002E: Print "Entering MyIterator"

· IL_0033 - IL_0039: Store the string value "Foo" into the <>2__current field.

· IL_003E - IL_0040: Store the value "1" into <>1__state.

· IL_0045 - IL_0046: Make the method exit with a return value of 1.

Looking at another block (starting at IL_0047) we see a similar sequence of instructions. The most key difference is at IL_0063 - IL_0065, where the <>1__state is set to '2", rather than "1". In yet another block it's set to "3". In short, the <>1__state field is a numeric value that keeps track of which state the MoveNext() method is in. Each time MoveNext() is called, control begins at the top of the code, and branches to the appropriate handler block for that state. At the end of that block, <>1__state is updated to the next state.

What I've shown here is a rather simple example of C# iterators and how they're supported by rather sophisticated run time support which takes your code and creates a state machine around it.

If you think you've got iterators down, try wrapping your head around this slightly more complicated example:

//
// Iterator3.cs - Requires C# 2.0 or later
//

using System;
using System.Collections.Generic;

class SomeContainer
{
public IEnumerable < string > MyIterator()
{
Console.WriteLine( "Entering MyIterator" );

for ( int i = 0; i < 10; i ++ )
{
yield return i.ToString();
}
}
}

class foo
{
static void  Main()
{
SomeContainer container = new SomeContainer();

foreach( string s in container.MyIterator() )
Console.WriteLine( s );
}
}

Compile it with "csc iterator3.cs" and run the executable. Pretty nifty, eh?

Next time, I'll show a very cool way of using C#'s built in state machine support to do idle time processing. For instance, the Visual Studio IDE has a variety of code that needs to execute while the IDE is idle. Using C# iterators, it's fairly easy to accomplish this.

Comments

  • Anonymous
    July 26, 2004
    This seems like a higher-level construct of a "standard" NT Fiber. Am I missing something?

  • Anonymous
    July 26, 2004
    Does the method have to return "IEnumerable" or can the "yield return" be used to return other values? I can see where the lack of an Iterator would make it hard for the runtime to know when to restart at the begining of the function but it might be a fun trick to play with.

    B.T.W. I could just check this but I don't seem to have a version of .NET that supports this construct. Is 2.0 Beta available to the public?

  • Anonymous
    July 26, 2004
    There is an article in MSDN covering Coroutines in .NET.

    http://msdn.microsoft.com/msdnmag/issues/03/09/coroutinesinnet/

  • Anonymous
    July 26, 2004
    Sanin: When I first looked at Iterators, I also noticed the similarity to Win32 Fibers. However, Fibers aren't used in the C# implementation.

    Andrew: The Whidbey beta can be downloaded: http://lab.msdn.microsoft.com/vs2005/get/default.aspx#express

  • Anonymous
    July 26, 2004
    An question - Why exceptions try/catch/finaly not allowed with yeild ?


    See also:
    Fibers, and other mad stuff that you shouldn’t use
    http://blogs.msdn.com/greggm/archive/2004/06/07/150298.aspx

  • Anonymous
    August 02, 2004
    Just thought I'd say thanks for the article. I really enjoyed it!

  • Anonymous
    August 02, 2004
    Interesting..... but I don't get it. How or why would you ever use something like this???

  • Anonymous
    August 02, 2004
    The comment has been removed

  • Anonymous
    August 02, 2004
    Thanks, for the article.

  • Anonymous
    August 02, 2004
    Why not use collections?

  • Anonymous
    August 02, 2004
    It's interesting to compare these new C# iterators with the very similar ones in Python and Ruby. There's less code involved in those languages, making the iterators easier to understand.

    To see how they compare, I translated some of the sample C# iterator code to Python and Ruby:

    http://mg.to/2004/08/03/iterator-cpr

  • Anonymous
    August 03, 2004
    Timothy and drebin: The article I linked to in the main text (http://msdn.microsoft.com/msdnmag/issues/04/05/c20/) provides a much better "big picture" view of iterators than I did in this entry.

    My short answer about the usefulness of iterators is that they make it much easier to implement your data as .NET collections that work with "foreach". In the past, implementing the code to support enumerating your data could be quite complicated. Iterators also make it very easy to support multiple ways of iterating over your data (e.g., first to last, last to first, every other item, etc...)

  • Anonymous
    August 03, 2004
    The comment has been removed

  • Anonymous
    August 05, 2004
    Congratulations, this blog has been featured on TheServerSide.NET

  • Anonymous
    August 22, 2004
    Ping Back来自:blog.csdn.net

  • Anonymous
    July 12, 2008
    PingBack from http://iodyne.wordpress.com/2008/07/13/ccr-101-part-9-iterators/

  • Anonymous
    April 23, 2009
    PingBack from http://www.adamfrisby.com/blog/2009/04/mrm-microthreading-functions/