matlabfor-loopparallel-processingparformatlabpool

Does anyone know how to speed up this simple code running in parallel (with parfor)?


I do time-consuming simulations involving the following (simplified) code:

K=10^5; % large number
L=1000; % smaller number

a=rand(K,L);
b=rand(K,L);
c=rand(L,L);
d=zeros(K,L,L);

parfor m=1:L-1
    
    e=zeros(K,L);
    
    for n=m:L-1
        
        e(:,n+1)=e(:,n)+(n==m)+e(:,n).*a(:,n).*b(:,n)+a(:,1:n)*c(n,1:n)';
        
    end
    
    d(:,:,m)=e;
end

Does anyone know how to speed up this simple code running in parallel (with parfor)?

Since each worker requires matrices a and b and c, there is a large parallel overhead.

The overhead is smaller if I send each worker only the parts of the matrix b it needs (since the inner loop starts at m), but that doesn't make the code very much faster, I think.

Because of the large overhead, parfor is slower than the serial for-loop. As parfor iterations increase (increasing L), the sizes of a, b, and c also increase, and so does the overhead. Therefore, I do not expect the parfor loop to be faster even for large values of L. Or does anyone see it differently?


Solution

  • There may be a performance gain using pre-computation:

    tc = tril(c);
    ac = a * tc.';
    ab = a .* b;
    for m=1:L-1
        e = zeros(K,L);
        for n=m:L-1
            e(:, n + 1) = e(:, n) + (n==m) + e(:, n) .* ab(:, n) + ac(:, n);
        end
        d(:,:,m) = e;
    end