.netvb.netparallel.foreachparallel-extensions

Is it possible to parallelize an Iterator that uses Yield?


Consider this code, which extends the Array .NET type :

Public Module ArrayExtensions
    <System.Runtime.CompilerServices.Extension>
    Public Iterator Function ToEnumerable(Of T)(target As Array) As IEnumerable(Of T)
        For Each item In target
            Yield DirectCast(item, T)
        Next
    End Function
End Module

I use it to get a structure which the Min() and Max() extension methods will take. The arrays often run to several million elements in three dimensions, e.g. an array T(,,) is common.

EDIT: Specifically, this function comes into play with a line of code that looks like this:

    Return loadedData(rType).dataArray.ToEnumerable(Of Single).Min

where dataarray is (in this case) a value item in the ConcurrentDictionary loadedData and is of type Single(,,)

Without the ToEnumerable as currently written, there is no IEnumerable interface for the Max() extension function to hook to.

What would it take to "parallelize" this function? No form of Parallel.For that I've tried seems to work, because the loadedData array is not recognized as an IEnumerable type. (Is this because a Single(,,) is processed as a value type, perhaps?)

(No answer has to use VB. C# is also fine!)


Solution

  • Since you already have IEnumerable<T>, you can use AsParallel() on that (e.g. dataArray.ToEnumerable().AsParallel().Min()). But the IEnumerable interface is inherently serial, you can parallelize processing its elements, but not iterating it. This means that for very simpler operations like Min(), such parallelization doesn't make much sense.

    What could make sense here would be to parallelize the iteration too. This is possible, because you can access particular items of the array using the indexer.

    I tried to do this using a custom partitioner, but the results were worse than the serial version. The problem is that the overhead of each iteration has to be as small as possible and that's hard to do directly using partitioners.

    Instead, what you could do is to partition only the first dimension of your array (assuming you can be sure it's at least as big as your number of CPUs) and then use a version of ToEnumerable() that returns only part of the first dimension. Something like:

    private static IEnumerable<T> ToEnumerable<T>(this T[,,] array, int from, int to)
    {
        for (int i = from; i < to; i++)
        {
            for (int j = 0; j < array.GetLength(1); j++)
            {
                for (int k = 0; k < array.GetLength(2); k++)
                {
                    yield return array[i, j, k];
                }
            }
        }
    }
    
    Partitioner.Create(0, data.GetLength(0))
               .AsParallel()
               .Select(range => data.ToEnumerable(range.Item1, range.Item2).Min())
               .Min()
    

    This is about twice as fast as the serial version on my computer. But this still has the overhead of enumerator, which is quite significant in this case: this version is about twice as fast as the above parallel code:

    var length0 = data.GetLength(0);
    var length1 = data.GetLength(1);
    var length2 = data.GetLength(2);
    
    float min = float.MaxValue;
    
    for (int i = 0; i < length0; i++)
    {
        for (int j = 0; j < length1; j++)
        {
            for (int k = 0; k < length2; k++)
            {
                float value = data[i, j, k];
                if (value < min)
                    min = value;
            }
        }
    }
    
    return min;
    

    And now we can parallelize this code, which results in about four times speedup (as before, we partition the first dimension and then continue serially):

    var results = new ConcurrentQueue<float>();
    var length1 = data.GetLength(1);
    var length2 = data.GetLength(2);
    
    Parallel.ForEach(
        Partitioner.Create(0, data.GetLength(0)), range =>
        {
            float min = float.MaxValue;
    
            for (int i = range.Item1; i < range.Item2; i++)
            {
                for (int j = 0; j < length1; j++)
                {
                    for (int k = 0; k < length2; k++)
                    {
                        float value = data[i, j, k];
                        if (value < min)
                            min = value;
                    }
                }
            }
    
            results.Enqueue(min);
        });
    
    return results.Min();
    

    But wait! There's more. Multidimensional arrays are quite slow in .Net, so from a performance standpoint, it can make sense to use a jagged array (float[][][] instead of float[,,]), even if a multidimensional array fits better. Using this, we can get about 50 % more speedup:

    dataJagged.AsParallel().Min(
        level1 =>
        {
            float min = float.MaxValue;
    
            foreach (var level2 in level1)
            {
                for (int k = 0; k < level2.Length; k++)
                {
                    float value = level2[k];
                    if (value < min)
                        min = value;
                }
            }
    
            return min;
        });
    

    To sum up, there is a table of timings on my computer using the various approaches: