YIELD Operator Fun
Recently I was looking for a nice way to process large text files and keeping memory consumption at a minimum.
Though I wanted to keep input logic strictly separated from filtering and processing components. I figured this is similar (but simpler) to what LINQ does in principle. I came across the C# YIELD operator which seamed ideal for my intension.
YIELD is put into iterator blocks (foreach) and works on methods returning IEnumerables<>.
It is a language operator keyword that makes the compiler output pretty decent code that can help you save memory.
I’ve put together an example where a large text file is read line by line and filtered so that only lines containing specific keywords will be returned in a string enumeration.
Key here is that the IO-code must not contain any filter logic though not all lines can be read into memory at once.
1: using (TextReader reader = File.OpenText("AwesomeHuge.txt"))
2: {
3: lines = ReadLinesFromFile(reader);
4: var filteredLines = FilterLines( lines, "SearchString");
5: foreach (var fl in filteredLines)
6: Console.WriteLine(fl);
7: }
The ReadLinesFromFile method (IO) looks straight forward:
1: private IEnumerable<string> ReadLinesFromFile(TextReader reader)
2: {
3: while (true)
4: {
5: var line = reader.ReadLine();
6:
7: if (line == null)
8: yield break;
9:
10: yield return line;
11: }
12: }
The FilterLines method (filter) makes use of YIELD once more so that in the end only those lines are kept that contain the search string.
1: private IEnumerable<string> FilterLines( IEnumerable<string> lines, string searchString)
2: {
3: foreach (string line in lines)
4: {
5: if (line.Contains(searchString))
6: {
7: Console.WriteLine("Include Line: " + line);
8: yield return line;
9: }
10: }
11: }
If you never came across something similar I guess the best approach to understand what’s actually happening is by stepping through the code line by line.
The most important point is that a yielding method only gets executed once someone iterates over it’s IEnumerable return argument which sounds kind a recursive but it actually is not as implementation details show.
That means the collection only contain data that is relevant (through search string) and not the while file content as a quick glance might indicate.
For further reading have a look at this article: https://msdn.microsoft.com/en-us/library/9k7k7cf0(v=VS.100).aspx