Lambda Expression and ForEach loops
Cet article est disponible en français.
To enhance the performances of a type serializer, and to use a small extension that I recently wrote for Umbrella I stumbled upon an interesting small "Side Effect" seen when creating lambda expressions inside a foreach loop.
Let's take this simple piece of code :
[code:c#]
var actionList = new List<Func<int>>();
foreach (var value in Enumerable.Range(0, 10))
{
actionList.Add(() => value);
}
actionList.ForEach(func => Console.Write("{0} ", func()));
[/code]
Which outputs this :
[code:c#]
9 9 9 9 9 9 9 9 9 9
[/code]
Which is, of course, what we could have expected.
Lambda expression have the ability to use variables that are in the scope when they are declared. This makes them very interesting, but to properly use them, it is best to understand how they are "materialized" by the compiler.
Like a lot of features of C#, like the using, foreach, iterators or lock, lambdas are syntactic sugar destined to simplify the writing of code that is most of the time pretty verbose. It is possible to write the expanded code for these keyswords in C#.
Let's take this other piece of code :
[code:c#]
int a = 0;
Action action = () => Console.WriteLine(a);
action();
[/code]
The lambda expression is "materialized" by the C# compiler under the form of a "Display Class", that allows the storage of the local variable "a" :
[code:c#]
[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
public int a;
public void <Main>b__0()
{
Console.WriteLine(this.a);
}
}
[/code]
We can see that the indentifiers for the generated class are not valid in C#, but are valide from the CLR point of view. We can also see that the local variable used during the declaration is present as a member variable, in the class that contains the code of the lambda expression. The compiler will then write this to create an instance of the lambda expression :
[code:c#]
int a = 0;
var display = new <>c__DisplayClass1();
display.a = a;
Action action = new Action(display.<Main>b__0);
action();
[/code]
There also, this is not valid C#.
But then, what happens for the foreach case so that the content of the variable is repeated ?
If we analyze the first code sample generated by the compiler with Reflector, there is nothing much fancy to see with the C# visualizer :
[code:c#]
List<Func<int>> actionList = new List<Func<int>>();
using (IEnumerator<int> CS$5$0000 = Enumerable.Range(0, 10).GetEnumerator())
{
while (CS$5$0000.MoveNext())
{
int value = CS$5$0000.Current;
actionList.Add(delegate {
return value;
});
}
}
[/code]
The lambda expression is represented as an anonymous method, which is a synonym of lambda, but that does not explain the behavior.
We must look at the generated IL to understand the behavior, and this is the correct C# code that is generated :
[code:c#]
List<Func<int>> actionList = new List<Func<int>>();
using (IEnumerator<int> CS$5$0000 = Enumerable.Range(0, 10).GetEnumerator())
{
var myLambda = new <>c__DisplayClass4();
while (CS$5$0000.MoveNext())
{
int value = CS$5$0000.Current;
myLambda.value = value;
actionList.Add(new Func<int>(myLambda.b_0));
}
}
[/code]
We can easily see what the problem is : The instance of the class containing the lambda is created only once, and reused many times to assign a new value for each iteration. This explains why the execution of all the lambdas return the last enumerated value, because they all refer to the same instance of the "DisplayClass" type.
However, if we write the code this way :
[code:c#]
foreach (var value in Enumerable.Range(0, 10))
{
int myValue = value;
actionList.Add(() => myValue);
}
[/code]
The behavior changes, and this time, each lambda has the correct value.
From the compiler point of view, the creation of a new instance for the container class must be the consequence of the creation of a new variable. For the case of a ForEach statement, this is not the case and the variable is treated as created only once, then reused..
So, from the compiler point of view, a ForEach loop is expanded like this :
[code:c#]
using (var it = Enumerable.Range(0, 10).GetEnumerator())
{
int value;
while (it.MoveNext())
{
value = it.Current;
actionList2.Add(() => value);
}
}
[/code]
This is probably a question of interpretation, but I wasn't exactly expecting this...
So, one must pay attention to the way local variables are used in lambda expressions, depending on their declaration location.
I'll explain in a later post why I did have to use lamda expressions in a ForEach loop.