An update to @matthieumezil, Rx and the FileSystemWatcher

By jay at August 26, 2012 20:52 Tags: , , , , , , ,

@MatthieuMEZIL is a fellow MVP and friend of mine, and back in february at the MVP summit we shared our mutual interests with Rx on my side, and Roslyn on his side.

I’ve yet to blog about Roslyn, (even though a blog post is in preparation) but he started using Rx and is a bit blogging about it.

Now, Rx is a tricky beast and there are many ways to do the same thing. I’ve promised Matthieu I’d give him a few tricks on how to improve his code. Now that I do have a bit of time, here it is.

More...

Asynchronous Programming with the Reactive Extensions (while waiting for async/await)

By jay at November 25, 2011 20:43 Tags: , , , , , ,

This article was originally published on the MVP Award Blog in December 2011.

Nowadays, with applications that use more and more services that are in the cloud, or simply perform actions that take a user noticeable time to execute, it has become vital to program in an asynchronous way.

But we, as developers, feel at home when thinking sequentially. We like to send a request or execute a method, wait for the response, and then process it.

Unfortunately for us, an application just cannot wait synchronously for a call to end anymore. Reasons can be that the user expects the application to continue responding, or because the application joins the results of multiple operations, and it is necessary to perform all these operations simultaneously for good performance.

Frameworks that are heavily UI dependent (like Silverlight or Silverlight for Windows Phone) are trying the force the developer's hand into programming asynchronously by removing all synchronous APIs. This leaves the developer alone with either the Begin/End pattern, or the plain old C# events. Both patterns are not flexible, not easily composable, often lead to memory leaks, and are just plain difficult to use or worse, to read.

C# 5.0 async/await

Taking a quick look at the not so distant future, Microsoft has taken the bold approach to augment its new .NET 4.5 to include asynchronous APIs and in the case of the Windows Runtime (WinRT), restrict some APIs to be asynchronous only. These are based on the Task class, and are backed by languages to ease asynchronous programming.

In the upcoming C# 5.0 implementation, the async/await pattern is trying to handle this asynchrony problem by making asynchronous code look synchronous. It makes asynchronous programming more "familiar" to developers.

If we take this example:

    static void Main(string[] args)
    {
        // Some initialization of the DB...
        Task<int> t = GetContentFromDatabase();

        // Execute some other code when the task is done
        t.ContinueWith(r => Console.WriteLine(r.Result));

        Console.ReadLine();
    }

    public static async Task<int> GetContentFromDatabase()
    {
        int source = 22;

        // Run starts the execution on another thread
        var result = (int) await Task.Run(
            () => { 
                // Simulate DB access
                Thread.Sleep(1000);
                return 10; 
            }
        );

        return source + result * 2;
    }

The code in GetContentFromDatabaselooks synchronous, but under the hood, it's actually split in half (or more) where the await keyword is used.

The compiler is applying a technique used many times in the C# language, known as syntactic sugar. The code is expanded to a form that is less readable, but is more of a plumbing code that is painful to write – and get right – each time. The using statement, iterators and more recently LINQ are very good examples of that syntactic sugar.

Using a plain old thread pool call, the code actually looks a lot more like this, once the compiler is done:

    public static void Main()
    {
        MySyncMethod(result => Console.WriteLine(result));
        Console.ReadLine();
    }

    public static void GetContentFromDatabase (Action<int> continueWith)
    {
        // The first half of the async method (with QueueUserWorkItem)
        int source = 22;

        // The second half of the async method
        Action<int> onResult = result => {
            continueWith(source + result * 2);
        };

        ThreadPool.QueueUserWorkItem(
            _ => {
                // Simulate DB access
                Thread.Sleep(1000);

                onResult(10);
            }
        );
    }

This sample somewhat more complex, and does not properly handle exceptions. But you probably get the idea.

Asynchronous Development now

Nonetheless, you may not want or will be able to use C# 5.0 soon enough. A lot of people are still using .NET 3.5 or even .NET 2.0, and new features like async will take a while to be deployed in the field. Even when the framework has been offering it for a long time, the awesome LINQ (a C# 3.0 feature) is still being adopted and is not that widely used.

The Reactive Extensions (Rx for friends) offer a framework that is available from .NET 3.5 and functionality similar to C# 5.0, but provide a different approach to asynchronous programming, more functional. More functional is meaning fewer variables to maintain states, and a more declarative approach to programming.

But don't be scared. Functional does not mean abstract concepts that are not useful for the mainstream developer. It just means (veryroughly) that you're going to be more inclined to separate your concerns using functions instead of classes.

But let's dive into some code that is similar to the two previous examples:

    static void Main(string[] args)
    {
        IObservable<int> query = GetContentFromDatabase();

        // Subscribe to the result and display it
        query.Subscribe(r => Console.WriteLine(r));

        Console.ReadLine();
    }

    public static IObservable<int> GetContentFromDatabase()
    {
        int source = 22;

        // Start the work on another thread (using the ThreadPool)
        return Observable.Start<int>(
                   () => {
                      Thread.Sleep(1000);
                      return 10;
                   }
               )

               // Project the result when we get it
               .Select(result => source + result * 2);
    }

From the caller's perspective (the main), the GetContentFromDatabase method behaves almost the same way a .NET 4.5 Task would, and the Subscribe pretty much replaces the ContinueWith method.

But this simplistic approach works well for an example. At this point, you could still choose to use the basic ThreadPoolexample shown earlier in this article.

A word on IObservable

An IObservable is generally considered as a stream of data that can push to its subscribers zero or more values, and either an error or completion message. This “Push” based model that allows the observation of a data source without blocking a thread. This is opposed to the Pull model provided by IEnumerable, which performs a blocking observation of a data source. A very good video with Erik Meijer explains these concepts on Channel 9.

To match the .NET 4.5 Task model, an IObservable needs to provide at most one value, or an error, which is what the Observable.Start method is doing.

A more realistic example

Most of the time, scenarios include calls to multiple asynchronous methods. And if they're not called at the same time and joined, they're called one after the other.

Here is an example that does task chaining:

    public static void Main()
    {
        // Use the observable we've defined before
        var query = GetContentFromDatabase();

              // Once we get the token from the database, transform it first
        query.Select(r => "Token_" + r)

             // When we have the token, we can initiate the call to the web service
             .SelectMany(token => GetFromWebService(token))

             // Once we have the result from the web service, print it.
             .Subscribe(_ => Console.WriteLine(_));
    }

    public static IObservable<string> GetFromWebService(string token)
    {
        return Observable.Start(
            () => new WebClient().DownloadString("http://example.com/" + token)
        )
        .Select(s => Decrypt(s));
    }

The SelectMany operator is a bit strange when it comes to the semantics of an IObservable that behaves like a Task. It can then be thought of a ContinueWith operator. The GetContentFromDatabase only pushes one value, meaning that the provided lambda for the SelectMany is only called once.

Taking the Async route

A peek at WinRT and the Build conference showed a very interesting rule used by Microsoft when moving to asynchronous API throughout the framework. If an API call nominally takes more than 50ms to execute, then it's an asynchronous API call.

This rule is easily applicable to existing .NET 3.5 and later frameworks by exposing IObservable instances that provide at most one value, as a way to simulate a .NET 4.5 Task.

Architecturally speaking, this is a way to enforce that the consumers of a service layer API will be less tempted to synchronously call methods and negatively impact the perceived or actual performance of an application.

For instance, a "favorites" service implemented in an application could look like this, using Rx:

    public interface IFavoritesService
    {
        IObservable<Unit> AddFavorite(string name, string value);
        IObservable<bool> RemoveFavorite(string name);
        IObservable<string[]> GetAllFavorites();
    }

All the operations, including ones that alter content, are executed asynchronously. It is always tempting to think a select operation will take time, but we easily forget that an Addoperation could easily take the same amount of time.

A word on unit: The name comes from functional languages, and represents the void keyword, literally. A deep .NET CLR limitation prevents the use of System.Void as a generic type parameter, and to be able to provide a void return value, Unit has been introduced.

Wrap up

Much more can be achieved with Rx but for starters, using it as a way to perform asynchronous single method call seems to be a good way to learn it.

Also, a note to Rx experts, shortcuts have been taken to explain this in the most simple form, and sure there are many tips and tricks to know to use Rx effectively, particularly when it is used all across the board. The omission of the Completed event is one of them.

Finally, explaining the richness of the Reactive Extensions is a tricky task. Even the smart guys of the Rx team have a hard time doing so... I hope this quick start will help you dive into it!

When declarativeness goes away for performance

By jay at July 25, 2011 20:35 Tags: , , ,

TLDR: The creation a C# expression tree is not cached, and using it to simulate a methodof keyword is terribly slow. This article is about reconsidering the use of this technique when performance is a concern.

 

During the development of my last projects, and it's been like that for a while, I've been used to look for ways to express programs in a more declarative or functional way.

LINQ is a pretty good tool to acheive that, as well as fluent interfaces, lamdbas and all those neat language features and tricks.

 

When the language is against you

But there are some times when the language is not there to support patterns, like with the use of the IPropertyChanged interface. The language (and the CLR for that matter) does not publicly support intercepting calls to properties or methods. That can actually be done through Transparent Proxies or dynamic proxy generation, but these are not supported on Windows Phone 7. The latter somehow will on WP7.1.

Anyway, using that interface requires raising an event with a string containing the name of the property that has changed.

The use of metadata in the form of strings is unfortunately not refactoring friendly, and if you change your property name, you've got a bug on your hands.

Since there is no methodof keyword in C#, you could say that the language is against you since there is no direct way to get the name of a property or method at compile time.

This can still be done through reflection with plumbing code that uses Expression Trees to work around it, and it works pretty fine.

That way you can easily write nice wrappers like this one :

public int MyProperty 
{
    get { return GetValue(() => MyProperty); }
    set { SetValue(() => MyProperty, value); }
}

That way, you get both the type safety and the refactoring friendly use of INotifyPropertyChanged.

 

Where declarativeness does not shine

If you crack open the assembly for a property like this one with a disassembler, you get this :

public int MyProperty
{
    get
    {
        return this.GetValue<int>(
            Expression.Lambda<Func<int>>(
               Expression.Property(
                  Expression.Constant(this, typeof(MainPage)
               ),
               (MethodInfo) methodof(MainPage.get_MyProperty)
            ),
        new ParameterExpression[0]));
    }
    set
    {
        this.SetValue<int>(
           Expression.Lambda<Func<int>>(
              Expression.Property(
                 Expression.Constant(this, typeof(MainPage)),
                 (MethodInfo) methodof(MainPage.get_MyProperty)
              ),
           new ParameterExpression[0]),
        value);
    }
}

That's a lot of code !

And worse, that's not the end of it, because you'll have to traverse the expression just to get the "methodof", and then call the "Name" property to get the string. All this for a string that will never change after you've compiled your assembly.

 

When declarativeness goes away for performance

But that would not be that bad if you executed that code once, or ran it on a desktop computer (or you don't care about performance). For desktop and server applications, where the cost of executing that kind of code is (almost) neglectible, you do not care much about that.

But if you execute that code a few million times, or run it on a Windows Phone 7 on the UI thread, you've got a problem. The expression is not cached, neither in a static variable or an instance variable, depending on the context. Sure, you could store it in an expression typed variable, and cache it manually that way, but you'd lose a bit declarativeness.

To make a small comparison, it takes 13 seconds on my Samsung Focus to execute 10,000 gets of the property using expressions, and takes 0.2 milliseconds to do the same using a simple string.

Pretty easy to choose, isn't it ?

That's where you lose the declarativeness away for performance, and cringe a little bit about it when you know that you'll have to chase a bug in the future because of a lazy rename. Still, you can have debug-time only code that walks up the stack and checks that the property actually exists, but you have to execute that code to find the bug.

 

Mitigating

Hopefully, there are tools like Postsharp that help in that regard, where a post processing step does the reflection once and for all, and injects that missing string. That's direction I'd rather not take, but since there's still no out of the box solution, that can be a good fit.

 

We're so used to techniques that avoid us to write boring code, but when performance is a concern, it is necessary to reconsider all coding reflexes and think twice before using them.

Why using a timer may not be the best idea

By jay at July 25, 2011 00:00 Tags: , , , , , ,

TLDR: The Reactive Extensions provide a TestScheduler class that abstracts time and allows for a precise control of a virtual time base. Using the Rx Schedulers mecanism instead of real timers enables the creation of fast running unit tests that validate time related algorithms without the inconvients of the actual time.

 

Granted, the title is a bit provocative, but nonetheless it's still a bad idea to use timer classes like System.Threading.Timer.

Why is that ? Because classes like this one are based on the actual time, and that makes it a problem because it is non-deterministic. This means that each time you want to test a piece of code that depends on time, you'll be having a somehow different result, and particularly if you're using very long delays, like a few hours, you probably will not want to wait that long to make sure your code works as expected.

What you want actually, to avoid the side effect of "real" time passing by, is virtual time.

 

Abstracting Time with the Reactive Extensions

The reactive extensions are pretty good at abstracting time, with the IScheduler interface and TestScheduler class.

Most operators in the Rx framework have an optional scheduler they can use, like the Dispatcher or the ThreadPool schedulers. These schedulers are used to change the context of execution of the OnNext, OnCompleted and OnError events.

But for the case of time, the point is to freeze the time and make it "advance" to the point in time we need, and most importantly when we need it.

Let's say that we have a view model with a command that performs a lengthy action on the network, and that we need that action to timeout after a few seconds.

Consider this service contract :

public interface IRemoteService
{
    /// 
    /// Gets the data from the server,
    /// returns an observable call to the server 
    /// that will provide either no values, one
    /// value or an error.
    /// 
    IObservable<string> GetData(string url);
}

This implementation is exposing an observable based API, where the consumer of this contract must take into account the fact that getting data must be performed asynchronously, because it may not provide any value for a long time or even not return anything at all.

Next, a method of a view model that is using it :

public void GetServerDataCommand()
{
    IsQueryRunning = true;
    ShowError = false;

    Service.GetData(_url)
           .Timeout(TimeSpan.FromSeconds(15))
           .Finally(() => IsQueryRunning = false)
           .Subscribe(
               s => OnQueryCompleted(s),
               e => ShowError = true
           );
}

And we can test it using Moq like this :

var remoteService = new Mock<IRemoteService>();

// Returns far in the future
remoteService.Setup(s => s.GetData(It.IsAny()))
             .Returns(
                Observable.Return("the result")
                          .Delay(TimeSpan.FromDays(1))
             );

var model = new ViewModel(remoteService.Object);

// Call the fake server
model.GetServerDataCommand();
Assert.IsTrue(model.IsQueryRunning);
Assert.IsFalse(model.ShowError);

// Sleep for a while, before the timeout occurs
Thread.Sleep(TimeSpan.FromSeconds(5));
Assert.IsTrue(model.IsQueryRunning);
Assert.IsFalse(model.ShowError);

// Sleep beyond the timeout
Thread.Sleep(TimeSpan.FromSeconds(11));

// Do we have an error ?
Assert.IsFalse(model.IsQueryRunning);
Assert.IsTrue(model.ShowError);

The problem with this test is that it depends on actual time, meaning that it takes at least 16 seconds to complete. This is not acceptable in a automated tests scenario, where you want your tests to run as fast as possible.

 

Adding the IScheduler in the loop

We can introduce the use of an injected IScheduler instance into the view model, like this :

.Timeout(TimeSpan.FromSeconds(15), _scheduler)

Meaning that the both Start and Timeout will get executed on the scheduler we provide, for which the time is controlled.

We can update the test like this :

var remoteService = new Mock<IRemoteService>();
var scheduler = new TestScheduler();

// Never returns
remoteService.Setup(s => s.GetData(It.IsAny<string>()))
             .Returns(
                Observable.Return("the result", scheduler)
                          .Delay(TimeSpan.FromDays(1), scheduler)
            );

var model = new ViewModel(remoteService.Object, scheduler);

// Call the fake server
model.OnGetServerData2();
Assert.IsTrue(model.IsQueryRunning);
Assert.IsFalse(model.ShowError);

// Go before the failure point
scheduler.AdvanceTo(TimeSpan.FromSeconds(5).Ticks);
Assert.IsTrue(model.IsQueryRunning);
Assert.IsFalse(model.ShowError);

// Go beyond the failure point
scheduler.AdvanceTo(TimeSpan.FromSeconds(16).Ticks);

// Do we have an error ?
Assert.IsFalse(model.IsQueryRunning);
Assert.IsTrue(model.ShowError);

When the scheduler is created, it holds of all the scheduled operations until AdvanceTo is called. Then the scheduled actions are executed according to the virtual current time.

That way, your tests run at full speed, and you can test properly your time depedent code.

[WP7Dev] Beware of the [ThreadStatic] attribute on Silverlight for Windows Phone 7

By Admin at June 19, 2010 21:36 Tags: , , , , , , ,

Cet article est disponible en francais.

In other words, it is not supported !

And the worst in all this is that you don’t even get warned that it’s not supported... The code compiles, but the attribute has no effect at all ! Granted that you can read the msdn article about the differences between silverlight on Windows and Windows Phone, but well, you may still miss it. Maybe a custom code analysis rule could prevent this.

Still, you want to use ThreadStatic because you probably need it, somehow. But since it is not supported, you could try the Thread.GetNamedDataSlot, mind you.

Well, too bad. It’s not supported either.

That leaves us implementing or own TLS implementation, by hand...

 

Updating Umbrella for Silverlight on Windows Phone

I’m a big fan of Umbrella, and the first time I had to use Dictionary<>.TryGetValue and its magically aweful out parameter in my attempt to rewrite my Remote Control app for Windows Phone 7, I decided to port Umbrella to it. So I could use GetValueOrDefault without rewriting it, again.

I managed to get almost all the desktop unit tests to pass, except for those who emit code, use web features, use xml and binary serializers, call private methods using reflection, and so on.

There are a few parts where the code needed to be updated, because TypeDescriptor class is not available on WP7, you have to crash and burn to see if a value is convertible from one type to the other. But that’s not too bad, it works as expected.

 

Umbrella’s ThreadLocalSource

Umbrella has this nice ThreadLocalSource class that wraps the TLS behavior, and you can easily create a static variable of that type instead of the ThreadStatic static variable.

The Umbrella quick start samples make that kind of use for it :

    ISource<int> threadLocal = new ThreadLocalSource<int>(1);

    int valueOnOtherThread = 0;

    Thread thread = new Thread(() => valueOnOtherThread = threadLocal.Value);
    thread.Start();
    thread.Join();

    Assert.Equal(1, threadLocal.Value);
    Assert.Equal(0, valueOnOtherThread);

The main thread set the value to 1, and the other thread tries to get the same value from the other thread and it should be different (the default value of an int, which is 0).

 

Updating the ThreadLocalSource to avoid the use of ThreadStatic

The TLS in .NET is basically a dictionary of string/object pairs that is attached to each running threads. So, to mimic this, we just need to make a list of all threads that want to store something for themselves and wrap it nicely.

We can create a variable of this type :

    private static Tuple<WeakReference, IDictionary<string, T>>[] _tls;

That variable is intentionally an array to try to make use of memory spacial locality, and since on that platform we won’t get a lot of threads, this should be fine when we got through the array to find one. This approach is trying to be lockless, by using a retry mechanism to update the array. The WeakReference is used to avoid keeping a reference to the thread after it has been terminated.

So, to update the array, we can do as follows :

    private static IDictionary<string, T> GetValuesForThread(Thread thread)
    {
        // Find the TLS for the specified thread
        var query = from entry in _tls

                    // Only get threads that are still alive
                    let t = entry.T.Target as Thread

                    // Get the requested thread
                    where t != null && t == thread
                    select entry.U;

        var localStorage = query.FirstOrDefault();

        if (localStorage == null)
        {
            bool success = false;

            // The storage for the new Thread
            localStorage = new Dictionary<string, T>();

            while(!success)
            {
                // store the original array so we can check later if there has not
                // been anyone that has updated the array at the same time we did
                var originalTls = _tls;

                var newTls = new List<Tuple<WeakReference, IDictionary<string, T>>>();

                // Add the slots for which threads references are still alive
                newTls.AddRange(_tls.Where(t => t.T.IsAlive));

                var newSlot = new Tuple<WeakReference, IDictionary<string, T>>()
                {
                    T = new WeakReference(thread),
                    U = localStorage
                };

                newTls.Add(newSlot);

                // If no other thread has changed the array, replace it.
                success = Interlocked.CompareExchange(ref _tls, newTls.ToArray(), originalTls) != _tls;
            }
        }

        return localStorage;
    }

Instead of the array, another dictionary could be created but I’m not sure of the actual performance improvement that would provide, particularly for very small arrays.

Using a lockless approach like this one will most likely limit the contention around the use of that TLS-like class. There may be, from time to time, computations that are performed multiple times in case of race conditions on the update of the _tls array, but that is completely acceptable. Additionally, livelocks are also out of the picture on that kind of preemptive systems.

I think developing on that platform is going to be fully of little workarounds like this one... This is going to be fun !

[LINQ] Finding the next available file name

By jay at June 10, 2010 20:16 Tags: , ,

Cet article est disponible en francais.


Sometimes, the most simple examples are the best.

 

Let’s say you have a configuration file, but you want to make a copy of it before you modify it. Easy, you copy that file to “filename.bak”. But what happens there’s already that file ? Well, either you replace it, or you create an autoincremented file.

 

If you want to do the latter, you could do it using a for loop. But since you’re a happy functional programming guy, you want to make it using LINQ.

 

You then can do it like this :

    public static string CreateNewFileName(string filePath)
    {
        if (!File.Exists(filePath))
            return filePath;

        // Don't do that for each file.
        var name = Path.GetFileNameWithoutExtension(filePath);
        var extension = Path.GetExtension(filePath);

        // Now find the next available file
        var fileQuery = from index in Enumerable.Range(2, 10000)

                        // Build the file name
                        let fileName = string.Format("{0} ({1}){2}", name, index, extension)

                        // Does it exist ?
                        where !File.Exists(fileName)

                        // No ? Select it.
                        select fileName;

        // Return the first one.
        return fileQuery.First();
    }

Note the use of the let operator, which allows the reuse of what is called a “range variable”. In this case, it avoids using string.Format multiple times.

 

The case of Infinity

There’s actually one problem with this implementation, which is the arbitrary “10000”. This might be fine if you don’t intend to make more than 10000 backups of your configuration file. But if you do, to lift that limit, we could write this iterator method :

    public static IEnumerable<int> InfiniteRange(int start)
    {
         while(true)
         {
             yield return start++;
         }
    }

Which basically will return an new value each time you ask for one. To use that method you have to make sure that you have an exit condition (the file does not exist, in the previous example), or you may well be enumerating until the end of times... Actually up to int.MaxValue, for the nit-pickers, but .NET 4.0 adds System.Numerics.BigInteger to be sure to get to the end of times. You never know.

 

To use this iterator, just replace :

        var fileQuery = from index in Enumerable.Range(2, 10000)

by

        var fileQuery = from index in InfiniteRange()

And you’re done.

Reactive Framework: MemoizeAll

By jay at February 04, 2010 18:21 Tags: , , ,

Cet article est disponible en francais.

For some time now, with the release of the Rx Framework and the reactive/interactive programming, some new features were highlighted through a very good article of Bart De Smet dealing with System.Interactive and the “lazy-caching”.

When using LINQ, one can find two sorts of operators: The “lazy” operators that take elements one by one and forward them when they a requested (Select, Where, SelectMany, First, …), and the operators that I would call “Rendez-vous” for which the entirety of the elements of the enumerators need to be enumerated (Count, ToArray/ToList, OrderBy, …) to produce a result.

 

“Lazy” Operators

Lazy operators are pretty useful as they offer a good performance when it is not required to enumerate all the elements of an enumerable. This can also be useful when it may take a very long time to enumerate each element of an enumerator, and that we only want to get the first few elements.

For instance this :
         

static IEnumerable<int> GetItems()
{
    for (int i = 0; i < 5; i++)
    {
        Console.WriteLine(i);
        yield return i + 1;
    }
}

static void Main()
{
   Console.WriteLine(GetItems().First());
   Console.WriteLine(GetItems().First());
}

Will output :


0
1
0
1

Only the first element of the enumerator will be enumerated from GetItems().

However, these operators expose a behavior that is important to know about: Each time they are enumerated, they also enumerate their source again. That could either be a advantage (Enumerating multiple times a changing source) or a problem (enumerating multiple times a resource intensive source).

 

“Rendez-vous” Operators

These operators are also interesting because they force the enumeration of all the elements of the enumerable, and in the particular case of ToArray, this allows the creation of an immutable version of the content of the enumerable. These are useful in conjunction with lazy operators to prevent them to enumerate their source again, when enumerated multiple times.

If we the previous sample, and update it a bit:


static void Main()
{
   var items = GetItems().ToArray();

   Console.WriteLine(items.Count());
   Console.WriteLine(items.Count());
}

We get this result :


0
1
2
3
4
5
5

Because Count() needs to know all the elements of the source enumerator to determine the count.

These operators also enumerate their source with each use, but using ToArray/ToList prevents their result to enumerate the source again.

The case of multiple enumerations

A concrete example of the problem posed by the multiple enumerations is the creation of an enumerable partitionning operator. In this example, we can see that the enumerable passed as the source is used by to "Where" different operators, which implies that the source enumerable will be enumerated twice. Storing the whole content of the source enumerable by means of a ToArray/ToList is possible, but that would be a possible waste of resource, mainly because we can't know if the output enumerable will be enumerated completely (If that is possible, as in the case of an infinite enumerable, ToArray is not applicable).

An intermediate operator between "Lazy" and "Rendez-vous" would be useful.

EnumerableEx.MemoizeAll

The EnumerableEx class brings us an extension, MemoizeAll (built from the Memoization concept), that is just the middle ground we're looking for, and will cache elements from the source enumerator when they are requested. A sort of "lazy" ToArray.

If we take the example of Mark Needham, we would modify it like this :


var evensAndOdds = Enumerable.Range(1, 10)
                             .MemoizeAll()
                             .Partition(x => x % 2 == 0);

In this example, the MemoizeAll does not have a real benefit on the performance side, since Enumerable.Range is not a very expensive operator. But in the case where the source of the "Partition" operator would be a most expensive enumerable, like a Linq2Sql query, the lazy caching could be very effective.

One of the comments suggests that a GroupBy based implementation could be written, but this operator also evaluates the source operator when a group is enumerated. The MemoizeAll is then again appropriate for better performance, but as always, this is a tradeoff between processing and memory.

By the way, Bart de Smeth discusses the part of the elimination of side effects linked the multiple enumeration of enumerables by using Memoize and MemoizeAll, which is not really an issue in the previous example, but is nonetheless a very interesting subject.

 

.NET 4.5 ?

On a side note, I find regrettable that the EnumerableEx extensions did not make their way in .NET 4.0... They are very useful, and not very complex. They may have arrived too late in the development cycle of .NET 4.0... Maybe in .NET 4.5 :)

Using Multiple Where Clauses in a LINQ Query

By Jay at December 06, 2008 15:32 Tags: , , ,
Cet article est disponible en francais.
 
After writing my previous article where I needed to intercept exceptions in a LINQ Query, I found out that it is possible to specify multiple where clauses in a LINQ Query.

Here is the query :


var q = from file in Directory.GetFiles(@"C:\Windows\Microsoft.NET\Framework\v2.0.50727", "*.dll")
        let asm = file.TryWith(f => Assembly.LoadFile(f))
        where asm != null
        let types = asm.TryWith(a => a.GetTypes(), (Exception e) => new Type[0])
        where types.Any()
        select new { asm, types };


This query is able to find assemblies for which it is possible to list types. The point of using multiple Where clauses is to avoir evaluating chunks of a query if previous chunks can prevent it. By the way, the TryWith around the Assembly.GetTypes() is there to intercept exceptions raised when loading types, in case dependencies would not be available at the moment of the enumeration.

A useful LINQ trick to remember !

IEnumerable<T>.Any() vs. IEnumerable<T>.Count() != 0

By Jay at May 10, 2008 11:10 Tags: , ,

This post is also available in french here. 

After reading this post from Eric Lippert, and reminding me that in the samples for this post, I'm using IEnumerable<T>.Count() where I'm actually not using the return value, therefore I'm enumerating through the whole collection for almost nothing.

I could have used IEnumerable<T>.Any(), which after looking at the IL code, just starts the enumeration, and stops just after one element if there is one and returns true, or if there isn't, returns false.

This is more effective, even without PLinq :)

A look at Linq to objects and the "let" keyword

By Jerome at April 26, 2008 13:08 Tags: , ,

Cet article est aussi disponible en français ici. 

I've had some time lately to use LINQ a bit more intensively and in particular to use the let keyword.

I had to process a lot of XML files placed in multiple folders, and I wanted to filter them using a regex. First, here's how to get all files from a directory tree :



  var files = from dir in Directory.GetDirectories(rootPath, SearchOption.AllDirectories)
              from file in Directory.GetFiles("", "*.*")
              select new { Path = dir, File = Path.GetFileName(file) };

At this point, I could have omitted the Directory.GetDirectories call because GetFiles can also search recursively. But since the GetFiles method only returns a string array and not an enumerator, it means that all my files would have been returned in one array, which is not memory effective. I'd rather have an iterator based implementation of GetDirectories and GetFiles for that matter, but the finest grained enumeration can only be done this way...

Anyway, having all my files, I now wanted to filter the collection with a specific Regex, as my legitimate files need to observe a specific pattern. So, I updated my query to this :



    Regex match = new Regex(@"(?<value>\d{4}).xml");
    var files2 = from dir in Directory.GetDirectories(args[0], "*", SearchOption.AllDirectories)
                 from file in Directory.GetFiles(dir, "*.xml")
                 let r = match.Match(Path.GetFileName(file))
                 where r.Success
                 select new {
                    Path = dir,
                    File = Path.GetFileName(file),
                    Value = r.Groups["value"].Value
                 };


This time, I've introduced the let keyword. This keyword is very interesting because it allows the creation of a query-local variable that can contain either collections or single objects. The content of this variable can be used in the where clause, as the source of another "from" query, or in the select statement.

In my case, I just wanted to have the result of the Regex match, so I'm just calling Regex.Match to validate the file name, and I'm placing the content of a Regex group in my resulting anonymous type.

Now, with all my files filtered I found that some XML files were not valid because they were not containing a specific node. So I filtered them again using this query :



    var files2 = from dir in Directory.GetDirectories(args[0], "*", SearchOption.AllDirectories)
                 from file in Directory.GetFiles(dir, "*.xml")
                 let r = match.Match(Path.GetFileName(file))
                 let c = XElement.Load(file).XPathSelectElements("//dummy")
                 where r.Success && c.Count() != 0
                 select new {
                    Path = dir,
                    File = Path.GetFileName(file),
                    Value = r.Groups["value"].Value
                 };

I've added a new let clause to add the loading of the file, and make sure there is a node named "dummy" somewhere in the xml document. By the way, if you're looking XPath in XLinq, just look over there.

You may wonder by looking at this query when the load is actually evaluated... Well, it is only evaluated when the c.Count() call is performed, which is only after the regex has matched the file ! This way, I'm not trying to load all the files returned by GetFiles. You need to always remember that queries are evaluated only when enumerated.

In conclusion, Linq is a very interesting piece of technology, definitely NOT reserved to querying databases. What I like the most is that one can write code almost without any loop, therefore reducing side effects.

If you haven't looking at Linq yet, just give it a try, you'll probably like it :)

About me

My name is Jerome Laban, I am a Software Architect, C# MVP and .NET enthustiast from Montréal, QC. You will find my blog on this site, where I'm adding my thoughts on current events, or the things I'm working on, such as the Remote Control for Windows Phone.