Is there a way to run scripts with dependencies via GNU parallel
?
I wish to run the following scripts:
aa_00.sh # run time ~6 hr
aa_01.sh # dependent on aa_00.sh; ~6 hr
aa_02.sh # dependent on aa_00.sh; ~6 hr
aa_03.sh # dependent on aa_00.sh; ~6 hr
bb_00.sh # run time ~2 hr
bb_01.sh # dependent on bb_00.sh; ~2 hr
bb_02.sh # dependent on bb_00.sh; ~2 hr
bb_03.sh # dependent on bb_00.sh; ~2 hr
Scripts aa_01.sh
, aa_02.sh
, and aa_03.sh
must not run until script aa_00.sh
completes.
Scripts aa_01.sh
, aa_02.sh
, and aa_03.sh
are completely independent of each other and can run in parallel.
Similarly, scripts bb_01.sh
, bb_02.sh
, and bb_03.sh
must not run until script bb_00.sh
completes.
Scripts bb_01.sh
, bb_02.sh
, and bb_03.sh
are completely independent of each other and can run in parallel.
I have 4 CPUs [*].
[*] Actually, I am using GPUs so I am using:
'eval CUDA_VISIBLE_DEVICES={%} {}'
# i removed the "({%} - 1)" notation just for simplicity here
Is there a way to run these 8 scripts efficiently such that the dependencies on aa_00.sh
and bb_00.sh
are respected?
One idea I had was at the completion of aa_00.sh
, release the subsequent aa_{1,2,3}.sh
scripts via parallel
. And at the completion of bb_00.sh
, release the subsequent bb_{1,2,3}.sh
scripts via parallel
. But because two different runs of parallel
are used, the bb_*
scripts don't know that aa_*
scripts are running (and vice versa):
cat commands_aa.txt
aa_01.sh
aa_02.sh
aa_03.sh
CUDA_VISIBLE_DEVICES=0 aa_00.sh
parallel -j4 -a commands_aa.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
cat commands_bb.txt
bb_01.sh
bb_02.sh
bb_03.sh
CUDA_VISIBLE_DEVICES=1 bb_00.sh
parallel -j4 -a commands_bb.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
Conceptually, I'd like to add inputs to an already running parallel
command. I tried overwriting the -a commands.txt
file when parallel
was already running but that did not achieve what I wanted (I would have been shocked if that did work).
In actuality, I have more than just aa
and bb
scripts; I have as many as 8 or 10 (ie, aa
, bb
, ..., hh
, ii
, ...). And I have more than 3 scripts that run after the _00
script; I have 12 in total: _00
+ _01
, ..., _11
. All of them have the dependency on their respective _00
script.
I was looking at the python library luigi
, too. luigi
can handle dependencies but I don't think it can handle parallelization. I also looked at the python module joblib.Parallel()
. Perhaps I need to combine luigi
and joblib.Parallel()
.
Thank you.
Additional Thoughts
_00
script add its dependents upon its completion.parallel
is already working on.Something like this (conceptually):
commands.txt
contains:aa_00.sh
bb_00.sh
parallel
:parallel -j4 -a commands.txt 'eval CUDA_VISIBLE_DEVICES={%} {}'
CUDA_VISIBLE_DEVICES=1
<-- aa_00.sh
CUDA_VISIBLE_DEVICES=2
<-- bb_00.sh
when bb_00.sh
completes it appends its dependencies to the bottom of commands.txt
, like so:
commands.txt
updated:
aa_00.sh # still running on GPU 1
bb_00.sh # this completed on GPU 2
bb_01.sh # these new scripts are
bb_02.sh # appended to
bb_03.sh # commands.txt
Somehow, parallel
magically is okay with these new lines of input and these new scripts are queued to GPUs 3, 4, and 2.
CUDA_VISIBLE_DEVICES=3
<-- bb_01.sh
CUDA_VISIBLE_DEVICES=4
<-- bb_02.sh
CUDA_VISIBLE_DEVICES=2
<-- bb_03.sh
CUDA_VISIBLE_DEVICES=1
<-- still running aa_03.sh
bb_01.sh
completes on GPU 3; no dependencies so nothing is appended to commands.txt
The joblog would look something like:
aa_00.sh GPU=1 running
bb_00.sh GPU=2 completed
bb_01.sh GPU=3 completed
bb_02.sh GPU=4 running
bb_03.sh GPU=2 running
Eventually aa_00.sh
completes so it appends its dependencies to the bottom of commands.txt
.
commands.txt
updated:
aa_00.sh # completed on GPU 1
bb_00.sh # completed on GPU 2
bb_01.sh # completed on GPU 3
bb_02.sh # running on GPU 4
bb_03.sh # running on GPU 2
aa_01.sh # these new scripts are
aa_02.sh # appended to
aa_03.sh # commands.txt
Again, parallel
is magically okay with these new lines of input so it dishes out the new scripts to available GPUs.
CUDA_VISIBLE_DEVICES=3
<-- aa_01.sh
CUDA_VISIBLE_DEVICES=1
<-- aa_02.sh
Suppose bb_02.sh
completes next, freeing up GPU 4.
CUDA_VISIBLE_DEVICES=4
<-- aa_03.sh
Now the joblog looks something like:
aa_00.sh GPU=1 completed
bb_00.sh GPU=2 completed
bb_01.sh GPU=3 completed
bb_02.sh GPU=4 completed
bb_03.sh GPU=2 completed
aa_01.sh GPU=3 running
aa_02.sh GPU=1 running
aa_03.sh GPU=4 running
(I may have mixed up the numbering and surely the timing isn't correct (aa runs 3x longer than bb), but hopefully I explained the ordering correctly.)
It's the "magical" part of parallel
that I'm unsure of.
So something like:
true >jobqueue; tail -n+0 -f jobqueue | parallel -j4 'eval CUDA_VISIBLE_DEVICES={%} {}'
echo "aa_00.sh; (echo aa_01.sh; echo aa_02.sh; echo aa_03.sh) >> jobqueue" >> jobqueue
echo "bb_00.sh; (echo bb_01.sh; echo bb_02.sh; echo bb_03.sh) >> jobqueue" >> jobqueue
We are clearly in territory where there must be better tools: GNU Parallel does not have a dependency graph like make
has.