A closer look at yield – part 2
In part 1, we took a quick tour of the yield keyword. In this post we’re going to have a look at the code that the compiler generates for us when we use yield. We’ll return to the first example from last time and insert a Console.WriteLine before the yield return statement:
private static readonly string[] StringValues = new string[] { "The", "quick", "brown", "fox",
"jumped", "over", "the", "lazy", "dog" };
static IEnumerable<string> TestIterator() {
foreach (string value in StringValues) { Console.WriteLine("In iterator:{0}", value); yield return value; }
}
If we run the following code to execute the iterator, we’d now see the output shown below:
foreach(string value in TestIterator())
{
Console.WriteLine("In foreach:{0}", value);
}
Output:
In iterator:The In foreach:The In iterator:quick In foreach:quick In iterator:brown In foreach:brown ...
It’s clear from this output that the TestIterator isn’t returning all of the values in one go and then returning control to the calling code. Conceptually, you can view it that the TestIterator method is iterating over the collection that it wants to return, temporarily returning control to the caller and then resuming where it left off. I recently fired up Reflector (https://www.aisto.com/roeder/dotnet/) to look at the generated code and was initially quite surprised. The most interesting part is the IEnumerator.MoveNext function, and here’s a roughly equivalent piece of code (tidied up for readability):
private bool MoveNext() { switch (this.state) { case 0: this.state = -1; this.state = 1; this.Values = Program.StringValues; this.index = 0; while (this.index < this.Values.Length) { this.temp = this.Values[this.index]; Console.WriteLine("In iterator:{0}", this.temp); this.current = this.temp; this.state = 2; return true; Label_0084: this.state = 1; this.index++; } this.state = -1; break; case 2: goto Label_0084; } return false; }
The state member is initialised to 0, so when MoveNext is first called it drops into the first case block. This sets up the Values and index members and then starts iterating over the array. Once it has got the first item, it sets the state to 2, stores the value in current (which is then returned via IEnumerator.Current) and returns true so that the caller can process the result. When MoveNext is called the second time, the code drops into the second case block which causes it to jump to immediately after the return true statement (the Label_0084 line). This was the ‘aha’ moment for me – conceptually it is doing exactly what I described above: iterating over the list, returning each item (and control) to the caller, and then resuming exactly where it left off. It’s so brilliant that it almost shouldn’t work! Since all of the state is stored in member variables, when it comes in to the function on subsequent calls it can safely carry on processing. And when it gets to the end of the list it breaks out of the switch block and returns false (setting the state to –1 to ensure it keeps returning false if called again).
Now imagine the code above (and the other methods required to implement IEnumerator) and compare it to the TestIterator method at the top of the post. Which could you code up more quickly? And more importantly, which is more readable to you?
Comments
- Anonymous
August 13, 2008
This was only going to be two posts, but after my last post I’d been mulling over a post that looks at