Cet article est disponible en francais.

For some time now, with the release of the Rx Framework and the reactive/interactive programming, some new features were highlighted through a very good article of Bart De Smet dealing with System.Interactive and the “lazy-caching”.

When using LINQ, one can find two sorts of operators: The “lazy” operators that take elements one by one and forward them when they a requested (Select, Where, SelectMany, First, …), and the operators that I would call “Rendez-vous” for which the entirety of the elements of the enumerators need to be enumerated (Count, ToArray/ToList, OrderBy, …) to produce a result.

 

“Lazy” Operators

Lazy operators are pretty useful as they offer a good performance when it is not required to enumerate all the elements of an enumerable. This can also be useful when it may take a very long time to enumerate each element of an enumerator, and that we only want to get the first few elements.

For instance this :
          [code:c#]

static IEnumerable<int> GetItems()
{
    for (int i = 0; i < 5; i++)
    {
        Console.WriteLine(i);
        yield return i + 1;
    }
}</p>

static void Main()
{
   Console.WriteLine(GetItems().First());
   Console.WriteLine(GetItems().First());
}

[/code]</code>

Will output :

[code:c#]
0
1
0
1

[/code]

Only the first element of the enumerator will be enumerated from GetItems().

However, these operators expose a behavior that is important to know about: Each time they are enumerated, they also enumerate their source again. That could either be a advantage (Enumerating multiple times a changing source) or a problem (enumerating multiple times a resource intensive source).

 

“Rendez-vous” Operators

These operators are also interesting because they force the enumeration of all the elements of the enumerable, and in the particular case of ToArray, this allows the creation of an immutable version of the content of the enumerable. These are useful in conjunction with lazy operators to prevent them to enumerate their source again, when enumerated multiple times.

If we the previous sample, and update it a bit:

[code:c#]
static void Main()
{
   var items = GetItems().ToArray();</p>

   Console.WriteLine(items.Count());
   Console.WriteLine(items.Count());
}

[/code]</code>

We get this result :

[code:c#]
0
1
2
3
4
5
5

[/code]

Because Count() needs to know all the elements of the source enumerator to determine the count.

These operators also enumerate their source with each use, but using ToArray/ToList prevents their result to enumerate the source again.

The case of multiple enumerations

A concrete example of the problem posed by the multiple enumerations is the creation of an enumerable partitionning operator. In this example, we can see that the enumerable passed as the source is used by to "Where" different operators, which implies that the source enumerable will be enumerated twice. Storing the whole content of the source enumerable by means of a ToArray/ToList is possible, but that would be a possible waste of resource, mainly because we can't know if the output enumerable will be enumerated completely (If that is possible, as in the case of an infinite enumerable, ToArray is not applicable).

An intermediate operator between "Lazy" and "Rendez-vous" would be useful.

EnumerableEx.MemoizeAll

The EnumerableEx class brings us an extension, MemoizeAll (built from the Memoization concept), that is just the middle ground we're looking for, and will cache elements from the source enumerator when they are requested. A sort of "lazy" ToArray.

If we take the example of Mark Needham, we would modify it like this :

[code:c#]
var evensAndOdds = Enumerable.Range(1, 10)
                             .MemoizeAll()
                             .Partition(x => x % 2 == 0);

[/code]

In this example, the MemoizeAll does not have a real benefit on the performance side, since Enumerable.Range is not a very expensive operator. But in the case where the source of the "Partition" operator would be a most expensive enumerable, like a Linq2Sql query, the lazy caching could be very effective.

One of the comments suggests that a GroupBy based implementation could be written, but this operator also evaluates the source operator when a group is enumerated. The MemoizeAll is then again appropriate for better performance, but as always, this is a tradeoff between processing and memory.

By the way, Bart de Smeth discusses the part of the elimination of side effects linked the multiple enumeration of enumerables by using Memoize and MemoizeAll, which is not really an issue in the previous example, but is nonetheless a very interesting subject.

 

.NET 4.5 ?

On a side note, I find regrettable that the EnumerableEx extensions did not make their way in .NET 4.0... They are very useful, and not very complex. They may have arrived too late in the development cycle of .NET 4.0... Maybe in .NET 4.5 :)