Cet article est aussi disponible en français ici. 

I've had some time lately to use LINQ a bit more intensively and in particular to use the let keyword.

I had to process a lot of XML files placed in multiple folders, and I wanted to filter them using a regex. First, here's how to get all files from a directory tree :

[code:c#]

  var files = from dir in Directory.GetDirectories(rootPath, SearchOption.AllDirectories)
              from file in Directory.GetFiles("", "*.*")
              select new { Path = dir, File = Path.GetFileName(file) };

[/code]

At this point, I could have omitted the Directory.GetDirectories call because GetFiles can also search recursively. But since the GetFiles method only returns a string array and not an enumerator, it means that all my files would have been returned in one array, which is not memory effective. I'd rather have an iterator based implementation of GetDirectories and GetFiles for that matter, but the finest grained enumeration can only be done this way...

Anyway, having all my files, I now wanted to filter the collection with a specific Regex, as my legitimate files need to observe a specific pattern. So, I updated my query to this :

[code:c#]

    Regex match = new Regex(@"(?<value>\d{4}).xml");
    var files2 = from dir in Directory.GetDirectories(args[0], "*", SearchOption.AllDirectories)
                 from file in Directory.GetFiles(dir, "*.xml")
                 let r = match.Match(Path.GetFileName(file))
                 where r.Success
                 select new {
                    Path = dir,
                    File = Path.GetFileName(file),
                    Value = r.Groups["value"].Value
                 };

[/code]
This time, I've introduced the let keyword. This keyword is very interesting because it allows the creation of a query- local variable that can contain either collections or single objects. The content of this variable can be used in the where clause, as the source of another "from" query, or in the select statement.

In my case, I just wanted to have the result of the Regex match, so I'm just calling Regex.Match to validate the file name, and I'm placing the content of a Regex group in my resulting anonymous type.

Now, with all my files filtered I found that some XML files were not valid because they were not containing a specific node. So I filtered them again using this query :

[code:c#]

    var files2 = from dir in Directory.GetDirectories(args[0], "*", SearchOption.AllDirectories)
                 from file in Directory.GetFiles(dir, "*.xml")
                 let r = match.Match(Path.GetFileName(file))
                 let c = XElement.Load(file).XPathSelectElements("//dummy")
                 where r.Success && c.Count() != 0
                 select new {
                    Path = dir,
                    File = Path.GetFileName(file),
                    Value = r.Groups["value"].Value
                 };

[/code]

I've added a new let clause to add the loading of the file, and make sure there is a node named "dummy" somewhere in the xml document. By the way, if you're looking XPath in XLinq, just look over there.

You may wonder by looking at this query when the load is actually evaluated... Well, it is only evaluated when the c.Count() call is performed, which is only after the regex has matched the file ! This way, I'm not trying to load all the files returned by GetFiles. You need to always remember that queries are evaluated only when enumerated.

In conclusion, Linq is a very interesting piece of technology, definitely NOT reserved to querying databases. What I like the most is that one can write code almost without any loop, therefore reducing side effects.

If you haven't looking at Linq yet, just give it a try, you'll probably like it :)