I am trying to write a bash script that does some stuff, then starts a job array using sbatch
, and when all jobs in the array have finished successfully, starts another job using sbatch
. To have everything in one file, I use HereDocs for the SLURM scripts.
Submitting the job array works fine and the job ID I get when submitting the job array with the flag --parsable
is correct. When trying to submit the last job, which depends on the successful completion of all jobs in the array, I get the error
sbatch: error: Batch job submission failed: Job dependency problem
An uncluttered version of my bash script:
#!/bin/bash
# do some stuff to get the directory into which the python script
# running in the array jobs should write its results.
# every job in the array will make its own subdirectory
resdir="the/result/dir/"
# run a python script in a job array
# this part is working fine
jid= sbatch --parsable << EOF
#!/bin/bash
# ...
# configure the SBATCH stuff
# ...
#SBATCH --array=0-9
#
# do the conda stuff
#
# run the test
python main.py --chunk \$SLURM_ARRAY_TASK_ID --resdir $resdir
EOF
echo $jid #this echoes the correct job ID
# process results after the job array is done
# this part gives the error
# sbatch: error: Batch job submission failed: Job dependency problem
sbatch --dependency=afterok:$jid << EOF
#!/bin/bash
#
# configure the SBATCH stuff
#
# do the conda stuff
#
python process_results.py --dir $resdir
EOF
The problem was a missing $(
and )
. Replacing the jid= sbatch...
part with
jid=$(sbatch --parsable << EOF
#!/bin/bash
# ...
# configure the SBATCH stuff
# ...
#SBATCH --array=0-9
#
# do the conda stuff
#
# run the test
python main.py --chunk \$SLURM_ARRAY_TASK_ID --resdir $resdir
EOF
)
does the trick.