Consider prova.mat
in MATLAB obtained in the following way
for w=1:100
for p=1:9
A{p}=randn(100,1);
end
baseA_.A=A;
eval(['baseA.A' num2str(w) '= baseA_;'])
end
save(sprintf('prova.mat'),'-v7.3', 'baseA')
To have an idea of the actual dimensions in my data, the 1x9 cell
in A1
is composed by the following 9
arrays: 904x5, 913x5, 1722x5, 4136x5, 9180x5, 3174x5, 5970x5, 4455x5, 340068x5
. The other Aj
's have a similar composition.
Consider the following code
clear all
load prova
tic
parfor w=1:100
indA=sprintf('A%d', w);
Aarr=baseA.(indA).A;
Boot=[];
for p=1:9
C=randn(100,1).*Aarr{p};
Boot=[Boot; C];
end
D{w}=Boot;
end
toc
If I run the parfor
loop with 4
local workers in my Macbook Pro it takes 1.2 sec. Replacing parfor
with for
it takes 0.01 sec.
With my actual data, the difference of time is 31 sec versus 7 sec [the creation of the matrix C
is also more complicated].
If have understood correctly the problem is that the computer has to send baseA
to each local worker and this takes time and memory.
Could you suggest a solution that is able to make parfor
more convenient than for
? I thought that saving all cells in baseA
was a way to save time by loading once at the beginning, but maybe I'm wrong.
A lot of functions have implicit multi-threading built-in, making a parfor
loop not more efficient, when using these functions, than a serial for
loop, since all cores are already being used. parfor
will actually be a detriment in this case, since it has the allocation overhead, whilst being as parallel as the function you are trying to use.
When not using one of the implicitly multithreaded functions parfor
is basically recommended in two cases: lots of iterations in your loop (i.e., like 1e10
), or if each iteration takes a very long time (e.g., eig(magic(1e4))
). In the second case you might want to consider using spmd
(slower than parfor
in my experience). The reason parfor
is slower than a for
loop for short ranges or fast iterations is the overhead needed to manage all workers correctly, as opposed to just doing the calculation.
Check this question for information on splitting data between separate workers.
Consider the following example to see the behaviour of for
as opposed to that of parfor
. First open the parallel pool if you've not already done so:
gcp; % Opens a parallel pool using your current settings
Then execute a couple of large loops:
n = 1000; % Iteration number
EigenValues = cell(n,1); % Prepare to store the data
Time = zeros(n,1);
for ii = 1:n
tic
EigenValues{ii,1} = eig(magic(1e3)); % Might want to lower the magic if it takes too long
Time(ii,1) = toc; % Collect time after each iteration
end
figure; % Create a plot of results
plot(1:n,t)
title 'Time per iteration'
ylabel 'Time [s]'
xlabel 'Iteration number[-]';
Then do the same with parfor
instead of for
. You will notice that the average time per iteration goes up (0.27s to 0.39s for my case). Do realise however that the parfor
used all available workers, thus the total time (sum(Time)
) has to be divided by the number of cores in your computer. So for my case the total time went down from around 270s to 49s, since I have an octacore processor.
So, whilst the time to do each separate iteration goes up using parfor
with respect to using for
, the total time goes down considerably.
This picture shows the results of the test as I just ran it on my home PC. I used n=1000
and eig(500)
; my computer has an I5-750 2.66GHz processor with four cores and runs MATLAB R2012a. As you can see the average of the parallel test hovers around 0.29s with a lot of spread, whilst the serial code is quite steady around 0.24s. The total time, however, went down from 234s to 72s, which is a speed up of 3.25 times. The reason that this is not exactly 4 is the memory overhead, as expressed in the extra time each iteration takes. The memory overhead is due to MATLAB having to check what each core is doing and making sure that each loop iteration is performed only once and that the data is put into the correct storage location.