I need to submit a slurm array that will run the same script 18000 times (for independent genes), and I wanted to do this in a way that won't cause problems for my Uni's cluster.
Currently, the MaxArraySize
set by the admins is 2048
. I was going to manually set my options like:
First array script:
#SBATCH --array=2-2000%300
N.B.: This should start at 2, because I want to skip the first line of the file that I am reading through using the array.
Next script:
#SBATCH --array=2001-4000%300
and so on...
But slurm does not like values above 2048 in the array.
Is there another way of doing this, which doesn't involve just a for loop submitting scripts for individual genes?
(All I can think are for loops, but then I lose the constraint option from slurm [%300
], to avoid clogging the scheduler.)
You can submit two jobs with
#SBATCH --array=1-2000%300
and, in the script, build the row index based on SLURM_ARRAY_TASK_ID
rather than using it directly. In the first job:
ROWINDEX=$((SLURM_ARRAY_TASK_ID+1))
In the second job:
ROWINDEX=$((SLURM_ARRAY_TASK_ID+2001))
and so on.
Then you use ROWINDEX to select the line you want in your input file.