Immutable Data and Memoization in C#, Part 2
tl;dr: Memoization can be associated with the ConditionalWeakTable class, which allows the addition of memoized computation results to immutable types. This makes the memoized results live as long as the instances that was used to create it.
In the first part of this article, we discussed a bit the Memoization pattern in C#. In this second part, we will discuss how to alleviate the memory management issue for memoized computation results.
ConditionalWeakDictionary to the rescue
In .NET 4.0 – quite a while ago now – a new class was added, the ConditionalWeakTable, to help in the creation of dynamic languages based on the DLR, where there was a need to be able to attach data to existing type instances, much like a fictional extension property could be. This topic is not covered much, and since it has to do with the GC, it is often misunderstood.
The idea is pretty simple: It is a dictionary that takes an type instance as a key, and a value associated to it. The key is stored as a weak reference, meaning that the data is held as a hard-reference as long as the key lives. When the key is collected by the GC, the hard-link is removed, making the data available for collection if it’s not referenced anywhere else.
Here’s how to use it:
ConditionalWeakTable<object, object> table = new ConditionalWeakTable<object, object>(); var o1 = new object(); var o2 = new object(); var r2 = new WeakReference(o2); table.Add(o1, o2); //Release the reference so it's only kept by the table o2 = null; GC.Collect(2); Console.WriteLine("{0}/{1}", r2.IsAlive, table.GetOrCreateValue(o1)); // Release the key instance, hence releasing o2. o1 = null; GC.Collect(2); Console.WriteLine("{0}", r2.IsAlive); // // Output: // // True/System.Object // False
Implementing this class without having a privileged access to the GC is pretty tricky. You quickly end up having to perform a weak reference dictionary that you’ll need to cleanup by hand and you’ll need to handle circular references properly. This can end up messy.
Composing the Memoization with the ConditionalWeakTable
Knowing this, it is pretty easy to see that it may be possible to create a version of the AsMemoized() helper that stores its computation results somewhere else, in a memory-friendly fashion.
The result of a computation can be associated with the data that was used to create it, and kept alive as long as the first source lives.
Here’s how this can be implemented :
using Storage = System.Collections.Concurrent.ConcurrentDictionary<object, object>; public static partial class FuncExtensions { private static ConditionalWeakTable<object, Storage> _weakResults = new ConditionalWeakTable<object, Storage>(); public static Func<TParam, TResult> AsWeakMemoized<TSource, TResult, TParam>( this Func<TSource, TParam, TResult> selector, TSource source ) { return param => { // Get the dictionary that associates delegates // to a parameter, on the specified source var values = _weakResults.GetOrCreateValue(source); object res; var key = new { selector, param }; // Get the result for the combination source/selector/param bool cached = values.TryGetValue(key, out res); if (!cached) { res = selector(source, param); values[key] = res; } return (TResult)res; }; } // Since is not possible to implicitly make a Func<T,U> out // of a method group, let's use the source as a function type inference. public static TResult ApplyMemoized<TSource, TResult, TParam>( this TSource source, Func<TSource, TParam, TResult> selector, TParam param ) { return selector.AsWeakMemoized(source)(param); } }
And is used like this :
class Program { static void Main(string[] args) { var data = new Data(42); var wr = new WeakReference(data.DoStuff(42)); var wr2 = new WeakReference(data.DoStuff(42)); GC.Collect(2); Console.WriteLine(wr.IsAlive); data = null; GC.Collect(2); Console.WriteLine(wr.IsAlive); } } // An immutable type public class Data { public Data(int value) { Value = value; } ~Data() { Console.WriteLine("~Data({0})", Value); } public int Value { get; private set; } } public static class DataExtensions { public static Data DoStuff(this Data data, int someValue) { return data.ApplyMemoized(InternalDoStuff, someValue); } private static Data InternalDoStuff(Data data, int someValue) { Console.WriteLine("Computing {0}", someValue); return new Data(data.Value + someValue); } } // // Output: // Computing 42 // True // False // ~Data(84) // ~Data(42) //
The ConditionalWeakDictionary takes the instance to attach the result to, a function to memoize, and will return a new delegate that will cache the results for a combination of the source and a parameter.
Note that to reduce the verbosity of the code, since it's not possible to use a method group as a delegate, the ApplyMemoized method is used to take advantage of the few places where the delegate type inference is available.
AsWeakMemoized, the caveats
This helper is far from perfect, and it has to do with the implementation of the ConditionalWeakTable implementation.
Referential Transparency
The documentation says "Two keys are equal if passing them to the Object.ReferenceEquals method returns true.". This means that the even though two instances of the same data may be equal in their value, through IEquatable or GetHashCode/Equals, they will be considered as different. This means that through this helper, type instances are not referentially transparent.
When using AsWeakMemoized, this is an implementation detail that needs to be taken into account. The same computation may be performed twice because the computation will be memoized on a projection of an immutable instance.
I mentioned earlier that implementing a ConditionalWeakTable can be tricky, but implementing referential transparency could be a reason to do it. This would allow for the memoizer to compare values instead of references.
But it would also make memory management a bit more complex, regarding the owner of the computation result. For instance, there is the question of how long a referentially transparent result for a single value should be kept alive, even if the original reference that was used to create the resulting value has been collected.
You might want to go down that road :)
Static delegates
You'll notice that the memoization is using the delegate as a key for the storage. This means that if the provided delegate changes, either because it is a lambda that creates a closure, or because the delegate is created from an instance method, the memoization will not happen.
At the very worse, the memoization will do nothing, or maybe simply allocate a few more bytes every time for the storage of the result.
Occasional Duplicate Work
Finally, you'll notice that there are no locks in this implementation. This means that the computation for a set of inputs may occasionally be executed multiple times in case of a race condition. While this can be a performance issue on systems that do not have enough CPUs (< 4), on greater multicore CPUs, it is acceptable to have the CPUs occasionally compute the same data even if it is discarded, just for the sake of avoiding acquiring locks.
Final words
I personally think that Functional Programming concepts such as Immutability and method purity, along with techniques such as Memoization, are part of the answer to harness the power of multi-core CPUs. CPU Scaling out is not going away, so we'd better make good use of this hardware, using the appropriate development tools.