system.reactiveplinqparallel-extensionstask-parallel-library

How do Reactive Framework, PLINQ, TPL and Parallel Extensions relate to each other?


At least since the release of .NET 4.0, Microsoft seems to have put a lot of effort in support for parallel and asynchronous programming and it seems a lot of APIs and libraries around this have emerged. Especially the following fancy names are constantly mentioned everywhere lately:

Now they all seem to be Microsoft products and they all seem to target asynchronous or parallel programming scenarios for .NET. But it is not quite clear what each of them actually is and how they are related to each other. Some might actually be the same thing.

In a few words, can anyone set the record straight on what is what?


Solution

  • PLINQ (Parallel Linq) is simply a new way to write regular Linq queries so that they run in parallel - in other words, the Framework will automatically take care of running your query across multiple threads so that they finish faster (i.e. using multiple CPU cores).

    For example, let's say that you have a bunch of strings and you want to get all the ones that start with the letter "A". You could write your query like this:

    var words = new[] { "Apple", "Banana", "Coconut", "Anvil" };
    var myWords = words.Select(s => s.StartsWith("A"));
    

    And this works fine. If you had 50,000 words to search, though, you might want to take advantage of the fact that each test is independent, and split this across multiple cores:

    var myWords = words.AsParallel().Select(s => s.StartsWith("A"));
    

    That's all you have to do to turn a regular query into a parallel one that runs on multiple cores. Pretty neat.


    The TPL (Task Parallel Library) is sort of the complement to PLINQ, and together they make up Parallel Extensions. Whereas PLINQ is largely based on a functional style of programming with no side-effects, side-effects are precisely what the TPL is for. If you want to actually do work in parallel as opposed to just searching/selecting things in parallel, you use the TPL.

    The TPL is essentially the Parallel class which exposes overloads of For, Foreach, and Invoke. Invoke is a bit like queuing up tasks in the ThreadPool, but a bit simpler to use. IMO, the more interesting bits are the For and Foreach. So for example let's say you have a whole bunch of files you want to compress. You could write the regular sequential version:

    string[] fileNames = (...);
    foreach (string fileName in fileNames)
    {
        byte[] data = File.ReadAllBytes(fileName);
        byte[] compressedData = Compress(data);
        string outputFileName = Path.ChangeExtension(fileName, ".zip");
        File.WriteAllBytes(outputFileName, compressedData);
    }
    

    Again, each iteration of this compression is completely independent of any other. We can speed this up by doing several of them at once:

    Parallel.ForEach(fileNames, fileName =>
    {
        byte[] data = File.ReadAllBytes(fileName);
        byte[] compressedData = Compress(data);
        string outputFileName = Path.ChangeExtension(fileName, ".zip");
        File.WriteAllBytes(outputFileName, compressedData);
    });
    

    And again, that's all it takes to parallelize this operation. Now when we run our CompressFiles method (or whatever we decide to call it), it will use multiple CPU cores and probably finish in half or 1/4th the time.

    The advantage of this over just chucking it all in the ThreadPool is that this actually runs synchronously. If you used the ThreadPool instead (or just plain Thread instances), you'd have to come up with a way of finding out when all of the tasks are finished, and while this isn't terribly complicated, it's something that a lot of people tend to screw up or at least have trouble with. When you use the Parallel class, you don't really have to think about it; the multi-threading aspect is hidden from you, it's all handled behind the scenes.


    Reactive Extensions (Rx) are really a different beast altogether. It's a different way of thinking about event handling. There's really a lot of material to cover on this, but to make a long story short, instead of wiring up event handlers to events, Rx lets you treat sequences of events as... well, sequences (IEnumerable<T>). You get to process events in an iterative fashion instead of having them fired asynchronously at random times, where you have to keep saving state all the time in order to detect a series of events happening in a particular order.

    One of the coolest examples I've found of Rx is here. Skip down to the "Linq to IObservable" section where he implements a drag-and-drop handler, which is normally a pain in WPF, in just 4 lines of code. Rx gives you composition of events, something you don't really have with regular event handlers, and code snippets like these are also straightforward to refactor into behaviour classes that you can sleeve in anywhere.


    And that's it. These are some of the cooler features that are available in .NET 4.0. There are several more, of course, but these were the ones you asked about!