Lazy evaluation in C#

What is lazy evaluation? If you ask a functional programming enthusiast, you will probably get a reply similar to the one below:

“Oh well, allow me to enlighten you:

- In lazy evaluation, computation terminates whenever any reduction order terminates.

- It requires no more steps than eager evaluation.”

An English translation of the above would probably be similar to this: we will only execute code when needed and only as many times as required regardless of where in the execution pipeline the code sits.

This is easily demonstrable using the notation of a functional programming language such as Haskell. Consider the following script which defines 2 functions:

Four :: Integer -> Integer
Four x = 4

Infinity :: Integer
Infinity = 1 + Infinity

The first function above simply returns the integer 4 for any x. The second function is a recursive function (It never terminates when used in an eager evaluation mode). In the world of C#, these functions would look similar to the ones below:

static int Four(int x)

{

   return 4;

}

static int Infinity()

{

   return Infinity() + 1;

}

Imagine the following pipeline of function calls (for simplicity, I will not use the Haskell syntax in here)

int x = Four(Infinity());

In eager evaluation – CLR’s default behaviour – the above will throw a StackOverflowException exception due to the never ending recursive nature of the Infinity method.

There are 2 different evaluation orders for the above listing. The one on the left never terminates but the one of the right terminates after the first iteration:

Eager (used by CLR)

Lazy

Four(Infinity())// definition of Infinity=> Four(Infinity() + 1)// definition of Infinity=> Four(Infinity() + 1 + 1) => ...

Four(Infinity())// definition of Four=> 4

 

As you can see, lazy evaluation has some useful implications for program design. Many expressions can be thought of as pipelines therefore intermediate data structures need not exist all at once.

Let’s assume that we need to write a function which returns a list of all prime numbers:

static int GetPrimes()

{

   List<int> primes = new List<int>();

   int i = 1;

   while (true)

   {

      if (IsPrime(i)) primes.Add(i);

      i++;

   }

}

Unfortunately, the list of prime numbers is an infinite list therefore it cannot be returned all at once. Ignoring the Int32.MaxValue limit, the code above will never cease executing.

In .NET 1.0, we introduced the notion of enumerators which allowed us to somewhat create demand-driven (lazy) execution. With enumerators, the next value that is required from the list is retrieved (and possibly calculated) only when the MoveNext() method is called on the enumerator.

In C# 2.0 we introduced the yield keyword to make the programmer’s life easier by automatically creating the helper IEnumerable and IEnumrator implementer class at compile time:

static IEnumerable<int> GetPrimes()

{

   int i = 1;

   while (true)

   {

      if (IsPrime(i))

      {

         yield return i;

      }

      i++;

   }

}

The key point to notice with the above listing is that when GetPrimes() is called, this method does not calculate the first prime number until an enumerator is created from the returned object and the MoveNext is called on it. When MoveNext is called, then the first prime number is found and is accessible through the Current property of the enumerator. No other prime number is calculated until MoveNext is called again:

IEnumerable<int> ep = GetPrimes();

IEnumerator<int> enump = ep.GetEnumerator();

// claculate the first prime number
enump.MoveNext();
Console.WriteLine(enump.Current);

// claculate the second prime number
enump.MoveNext();

Console.WriteLine(enump.Current);

Result:

 

With the use of the yield keyword, we turned a never terminating method into a demand-driven function.

In C# 3.0 with the introduction of query expressions as part of the LINQ framework, we now see the real benefit of this amazingly useful compiler trick (yield keyword results in a code transformation at compile time). Assume the following query expression:

var q =

    from c in customers

    where c.City == "London"

    select c.Name;

At compile time, it is compiled into IL with the following semantic:

IEnumerable<string> q =

   customers

   .Where(CustomersInLondon)

   .Select(c => c.Name);

...

static bool CustomersInLondon(Customer c)

{

   return c.City == "London";

}

As you can see both Where and Select methods also return an instance of a type implementing IEnumerable<string> meaning that they are not executed until an enumerator is created for q and its MoveNext method is called. Indeed CustomersInLondon is executed only when MoveNext is executing.

IEnumerator<string> e = q.GetEnumerator();

e.MoveNext();

Console.WriteLine(e.Current);

As mentioned earlier, a great benefit of lazy evaluation is that intermediate data structures need not exist all at once. In the above example, we don’t need to have the list of all customers in memory if we are only interested in the first 2 customers who live in London.

Comments

  • Anonymous
    November 27, 2007
    good article

  • Anonymous
    January 11, 2008
    PLINQ is built on top of the Task Parallel Library (TPL) and promises to revolutionise the way we write

  • Anonymous
    January 11, 2008
    PLINQ is built on top of the Task Parallel Library (TPL) and promises to revolutionise the way we write