I have a set of dependent Slurm jobs that are successfully submitting. The jobs are setup as:
a b
\ /
c
|
d
|
e
I need to submit this set of jobs 1000s of times, each time parametrized slightly differently. If I was going to be submitting a large batch of jobs without dependencies, I would use a job array to be kind to other users and the scheduler. What is the best practice for submitting job arrays of dependent jobs?
Possible wrinkle: Each job (a/b - e) is parameterized slightly differently for SBATCH (nodes, tasks-per-node, etc.).
The --dependency
option of sbatch
accepts aftercorr
to link each job in an array to the corresponding job (the job with the same task array ID) in another array.
The sequence would be (untested)
ArrayAID=$(sbatch --array=1-1000 A.sh)
ArrayBID=$(sbatch --array=1-1000 B.sh)
ArrayCID=$(sbatch --array=1-1000 --dependendy=aftercorr:$ArrayAID,$ArrayBID C.sh)
ArrayDID=$(sbatch --array=1-1000 --dependendy=aftercorr:$ArrayCID A.sh)
ArrayEID=$(sbatch --array=1-1000 --dependendy=aftercorr:$ArrayDID A.sh)
The ith job in array C will will wait for the ith job in each array A and B to complete before starting.
Slurm will most probably schedule jobs in array A before those in the other arrays but that depends on the characteristics of the job and the load of the cluster. You can use the --nice
option to alter the ordering and guide it the way you want (either having all of array A finished as soon as possible or having entire workflows finished as soon as possible, in which case you would make job E higher priority than job D, itself higher priority than job C.)