I have an embarrassingly parallel job that requires no communication between the workers. I'm trying to use the dfeval function, but the overhead seems to be enormous. To get started, I'm trying to run the example from the documentation.
>> matlabpool open
Starting matlabpool using the 'local' configuration ... connected to 8 labs.
>> sched = findResource('scheduler','type','local')
sched =
Local Scheduler Information
===========================
Type : local
ClusterOsType : pc
ClusterSize : 8
DataLocation : C:\Users\~\AppData\Roaming\MathWorks\MATLAB\local_scheduler_data\R2010a
HasSharedFilesystem : true
- Assigned Jobs
Number Pending : 0
Number Queued : 0
Number Running : 1
Number Finished : 8
- Local Specific Properties
ClusterMatlabRoot : C:\Program Files\MATLAB\R2010a
>> matlabpool close force local
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.
>> sched = findResource('scheduler','type','local')
sched =
Local Scheduler Information
===========================
Type : local
ClusterOsType : pc
ClusterSize : 8
DataLocation : C:\Users\~\AppData\Roaming\MathWorks\MATLAB\local_scheduler_data\R2010a
HasSharedFilesystem : true
- Assigned Jobs
Number Pending : 0
Number Queued : 0
Number Running : 0
Number Finished : 8
- Local Specific Properties
ClusterMatlabRoot : C:\Program Files\MATLAB\R2010a
>> tic;y = dfeval(@rand,{1 2 3},'Configuration', 'local');toc
Elapsed time is 4.442944 seconds.
Running subsequent times produces similar timings. So my questions are:
The DFEVAL
function is a wrapper around submitting a job with one or more tasks to a given scheduler, in your case the 'local' scheduler. With the 'local' scheduler, each new task runs in a fresh MATLAB worker session, which is why you see the 4.5 second overhead - that's the time take to launch the worker, work out what to do, do it, and then quit.
The reason that you need the number of running jobs to be zero is that the local scheduler can only run a restricted number of workers.
In general, PARFOR
with MATLABPOOL
is an easier combination to use than DFEVAL
. Also, when you open a MATLABPOOL
, the workers are launched and ready, so the overhead of PARFOR
is much less (but still not zero as the body of the loop needs to be sent to the workers).