Sdílet prostřednictvím


YIELD Operator Fun

Recently I was looking for a nice way to process large text files and keeping memory consumption at a minimum.

Though I wanted to keep input logic strictly separated from filtering and processing components. I figured this is similar (but simpler) to what LINQ does in principle. I came across the C# YIELD operator which seamed ideal for my intension.

YIELD is put into iterator blocks (foreach) and works on methods returning IEnumerables<>.

It is a language operator keyword that makes the compiler output pretty decent code that can help you save memory.

I’ve put together an example where a large text file is read line by line and filtered so that only lines containing specific keywords will be returned in a string enumeration.

Key here is that the IO-code must not contain any filter logic though not all lines can be read into memory at once.

    1:  using (TextReader reader = File.OpenText("AwesomeHuge.txt"))
    2:              {
    3:                  lines = ReadLinesFromFile(reader);
    4:                  var filteredLines = FilterLines( lines, "SearchString");
    5:                  foreach (var fl in filteredLines)
    6:                      Console.WriteLine(fl);                
    7:              }

The ReadLinesFromFile method (IO) looks straight forward:

    1:  private  IEnumerable<string> ReadLinesFromFile(TextReader reader)
    2:          {
    3:              while (true)
    4:              {
    5:                  var line = reader.ReadLine();
    6:   
    7:                  if (line == null)
    8:                      yield break;
    9:                  
   10:                  yield return line;
   11:              }
   12:          }

The FilterLines method (filter) makes use of YIELD once more so that in the end only those lines are kept that contain the search string.

    1:  private  IEnumerable<string> FilterLines(  IEnumerable<string> lines, string searchString)
    2:          {
    3:              foreach (string line in lines)
    4:              {
    5:                  if (line.Contains(searchString))
    6:                  {
    7:                      Console.WriteLine("Include Line: " + line);
    8:                      yield return line;
    9:                  }
   10:              }
   11:          }

If you never came across something similar I guess the best approach to understand what’s actually happening is by stepping through the code line by line.

The most important point is that a yielding method only gets executed once someone iterates over it’s IEnumerable return argument which sounds kind a recursive but it actually is not as implementation details show.

That means the collection only contain data that is relevant (through search string) and not the while file content as a quick glance might indicate.

For further reading have a look at this article: https://msdn.microsoft.com/en-us/library/9k7k7cf0(v=VS.100).aspx