bioinformaticsslurmblastsbatch

SLURM not interpreting array for infile names


I'm trying to submit my first array job to SLURM and I don't understand its behaviour.

I have 240 files like this:

$pwd
/nobackup/mdzm87/Diet_Pipeline/Sentinel/05_Ghana_BLAST/fwh2
$ls G*
G1000_fwh2_unique.fasta 
G1006_fwh2_unique.fasta
G1014_fwh2_unique.fasta
G1016_fwh2_unique.fasta
G1018_fwh2_unique.fasta
...

And I'm trying to get SLURM to do an array run for me with 50 files running at a time until all 240 are complete.

This is my script:

#!/bin/bash                                                                                                                                                                                                                                                                                                                                                                                         

#SBATCH -p shared                                                                                                                                                                                                           
#SBATCH -c 50                                                                                                                                                                                                               
#SBATCH --array=1000,1006,1014,1016,1018,1026,1027,1028,1038,1042,1043,1044,1048,1053,1058,1062,1064,1068,1072,1089,1093,1094,1095,1104,1106,1112,1127,1128,1130,1136,1140,1147,1161,1166,1168,1174,1178,1183,1185,1186,118\
7,1193,1195,1197,1199,1202,1204,1205,1207,1216,1220,1229,1232,1234,1252,1262,1271,1275,1280,1284,1290,1294,1307,1308,1311,1316,1321,1322,1325,1328,1331,1343,1344,1347,1360,1366,1374,1376,1385,1386,1403,1405,1413,1414,14\
21,1422,1424,1433,1435,1436,1439,1440,1441,1443,1447,1468,1474,1484,1487,1489,1491,1496,1525,1531,1532,1543,1551,1563,1568,1573,1574,1581,1584,1586,1588,1597,301,332,339,346,352,353,355,356,358,370,371,373,376,380,383,3\
93,409,413,414,415,416,417,418,424,428,432,441,453,456,457,462,465,467,478,484,488,49,491,492,496,497,502,507,508,511,515,516,519,520,527,553,556,557,574,576,591,594,596,607,618,630,640,642,647,648,649,653,654,656,676,6\
83,691,706,723,725,745,747,753,754,758,759,769,775,782,791,792,795,802,803,804,811,821,839,849,851,852,862,880,894,896,904,907,909,911,920,928,929,947,949,951,952,956,960,962,966,975,978,980,981,983,986,990,997,999      
#SBATCH -a %50                                                                                                                                                                                                              
#SBATCH --mem=2G                                                                                                                                                                                                            
#SBATCH -t 2-23:58:00

blastn -db Combined -query /nobackup/mdzm87/Diet_Pipeline/Sentinel/05_Ghana_BLAST/fwh2/G${SLURM_ARRAY_TASK_ID}_fwh2_unique.fasta -out /nobackup/mdzm87/Diet_Pipeline/Sentinel/05_Ghana_BLAST/fwh2/BLAST_G${SLURM_ARRAY_TASK\
_ID}.out -outfmt 11 -num_alignments 2 -num_threads 1                                                                                                                                                                                                       

However, when I submit the script the job dies and the slurm.out file says this:

Command line argument error: Argument "query". File is not accessible:  `/nobackup/mdzm87/Diet_Pipeline/Sentinel/05_Ghana_BLAST/fwh2/G0_fwh2_unique.fasta'

I've triple checked the paths and the permissions seem fine.

I can see that it hasn't interpreted my array as I intended, and is looking for a file called G0_fwh2_unique.fasta which doesn't exist.

I must be specifying the array wrong, but I can't figure out what else to do. Any insights would be greatly appreciated!


Solution

  • Your script is specifying the array parameter twice: —array and -a

    The second -a overrides the array indices of the first parameter.

    You should specify the limit at the end of your indices

    #SBATCH --array=1000,1006,1014,1016,1018,1026,1027,1028,1038,1042,1043,1044,1048,1053,1058,1062,1064,1068,1072,1089,1093,1094,1095,1104,1106,1112,1127,1128,1130,1136,1140,1147,1161,1166,1168,1174,1178,1183,1185,1186,118\
    7,1193,1195,1197,1199,1202,1204,1205,1207,1216,1220,1229,1232,1234,1252,1262,1271,1275,1280,1284,1290,1294,1307,1308,1311,1316,1321,1322,1325,1328,1331,1343,1344,1347,1360,1366,1374,1376,1385,1386,1403,1405,1413,1414,14\
    21,1422,1424,1433,1435,1436,1439,1440,1441,1443,1447,1468,1474,1484,1487,1489,1491,1496,1525,1531,1532,1543,1551,1563,1568,1573,1574,1581,1584,1586,1588,1597,301,332,339,346,352,353,355,356,358,370,371,373,376,380,383,3\
    93,409,413,414,415,416,417,418,424,428,432,441,453,456,457,462,465,467,478,484,488,49,491,492,496,497,502,507,508,511,515,516,519,520,527,553,556,557,574,576,591,594,596,607,618,630,640,642,647,648,649,653,654,656,676,6\
    83,691,706,723,725,745,747,753,754,758,759,769,775,782,791,792,795,802,803,804,811,821,839,849,851,852,862,880,894,896,904,907,909,911,920,928,929,947,949,951,952,956,960,962,966,975,978,980,981,983,986,990,997,999%50
    

    Also, your current setup will request 50 CPUs per job. As your blastn command specifies just 1 thread, adjust your -c parameter to 1

    #SBATCH -c 1