[SOLVED] Submitting slurm array job with a limit above MaxArraySize?

Submitting slurm array job with a limit above MaxArraySize?

I need to submit a slurm array that will run the same script 18000 times (for independent genes), and I wanted to do this in a way that won't cause problems for my Uni's cluster.

Currently, the MaxArraySize set by the admins is 2048. I was going to manually set my options like:

First array script:

#SBATCH --array=2-2000%300

N.B.: This should start at 2, because I want to skip the first line of the file that I am reading through using the array.

Next script:

#SBATCH --array=2001-4000%300

and so on...

But slurm does not like values above 2048 in the array.

Is there another way of doing this, which doesn't involve just a for loop submitting scripts for individual genes?

(All I can think are for loops, but then I lose the constraint option from slurm [%300], to avoid clogging the scheduler.)

Solution

You can submit two jobs with

#SBATCH --array=1-2000%300

and, in the script, build the row index based on SLURM_ARRAY_TASK_ID rather than using it directly. In the first job:

ROWINDEX=$((SLURM_ARRAY_TASK_ID+1))

In the second job:

ROWINDEX=$((SLURM_ARRAY_TASK_ID+2001))

and so on.

Then you use ROWINDEX to select the line you want in your input file.