Immutable Data and Memoization in C#, Part 1

TL;DR: Immutable data and memoization are functional programming concepts that can be applied to C# programming. These patterns have their strengths and weaknesses, discussed a bit in this article.

I’ve grown to not being a great fan a data mutability.

Data mutability can introduce a lot of side effects in the code, and it can be pretty complex to go back in time to know what a specific state was before the code failed. This gets worse when multiple threads are involved, and that tends to happen a lot more these days, now that even phones have multiple cores.

Sure, we can use IntelliTrace to ease that kind of debugging, but that’s pretty much limited to issues you already know about. That means you’re reacting to issues that already happened, you’re not proactively preventing those issues from happening.

So, to address this more reliably, there’s the concept of immutability. When a set of data is built, it cannot change anymore. This means that you can pass it around, do computation with it, use it any place you want, there’s not going to be any subtle concurrency issues because the data changed under your feet.

Immutability and Performance

The thing is, taken as is, immutability is not great friend of performance at first. When a computation is performed, the result is just passed to the next function that needs it, but if there is another function that depends on that same function's output, for the same input, then the computation is performed again.

There’s already a solution for this problem, known as Memoization.

The basic idea is that, if the input data is immutable, and that the computation has no side effect, then for a given set of inputs, the output will always be the same. This in turn, means that the result can be cached for later reuse. This is called Referential Transparency.

Here’s a simple example of memoization :

public static Func<TArg, TResult> 
   AsMemoized<TArg, TResult>(this Func<TArg, TResult> func)
{
    var values = new Dictionary<TArg, TResult>();

    return (v) =>
    {
        TResult value;

        if (!values.TryGetValue(v, out value))
        {
            value = values[v] = func(v);
        }

        return value;
    };
}

And here a simple use for this, where you know the data won’t change :

static void Main(string[] args)
{
    // Get the JIT out of the way
    GetAssignableTypes(typeof(IDisposable));

    Func<Type, Type[]> getAssignableTypes = GetAssignableTypes;
    getAssignableTypes = getAssignableTypes.AsMemoized();

    for (int i = 0; i < 10; i++)
    {
        var sw1 = Stopwatch.StartNew();
        var r1 = getAssignableTypes(typeof(IDisposable)).Length;
        Console.WriteLine(sw1.Elapsed);
    }
}

public static Type[] GetAssignableTypes(Type source)
{
    var r =
        from asm in AppDomain.CurrentDomain.GetAssemblies()
        from type in asm.GetTypes()
        where type.IsAssignableFrom(source)
        select type;

    return r.ToArray();
}

The first call takes about 90ms on my machine, at the rest about 0.7µs to execute.

As most of the time, this is tradeoff between memory and CPU. In this case, the result of the computation is store in a dictionary that is capture in the closure created in the AsMemoized method.

So, as long as the delegate produced by AsMemoized lives, so will the cached data.

Memoization storage and Immutability

But there are multiple issues with this pattern of memoization, meaning that the memoized delegate needs to be stored somewhere.

The first issue is that the memoized delegate can be stored along with an instance that has a greater lifetime. This can work if the overall memoized dataset is not too important, because with a naïve implementation as the one above, the data will not be discarded as long as the outer instance lives. The memoization can be reset, by re-creating the delegate, but it is very coarse.

The second one is about placing the memoized delegate along with the data that was used to create it. The memoized content will live as long as the data is alive, but there is a clear separation of concerns issue. Every time there is a new operation that needs to be performed on this data, then a new delegate has to be created to take care of this.

And finally, if there is a need for composite computation, which takes two or more instances, at this point, storing the delegate can be a memory-management challenge.

Next time

Next time, we'll see how memoization can be a little more friendly with the memory.