I have a really weird error that I'm completely stumped on and hoping for some help. I'm trying to submit a job via qsub using my batch script, which is pasted below. Now, when the scheduler starts my job one of the nodes throws an error saying that I need to load openmpi in order to use mpirun. But, if I take a version of this script with the scheduler commands removed, login to one of the compute nodes and run that script it works fine, meaning that the compute nodes know what these modules are and can use them perfectly. Does anyone have any idea why the running the script through the scheduler would cause the compute nodes to be unable to use mpirun but when the same script (effectively) is run on the compute nodes themselves it works fine? I get that there's not a lot to go on here, but even if someone can give me some insight into logic that I might be missing I'd appreciate it.
Here's the batch script:
#/bin/bash
NSLOTS=256
vasp_ver=vasp_std
export OMP_NUM_THREADS=1
export OMP_PLACES=cores
export OMP_PROC_BIND=close
module purge
module use /project/cmdlab/software/modules
module load intel/2021.1
module load vasp/6.3.2
#$ -P cmdlab
#$ -N vasp_test
#$ -l h_rt=48:00:00
#$ -pe mpi_32_tasks_per_node 256
mpirun -np ${NSLOTS} --map-by ppr:16:socket:PE=1 --bind-to core -v ${vasp_ver}
And here's the error thrown when it runs:
Please first load the openmpi module with the following command:
> module load openmpi
and then invoke mpirun with:
> mpirun [argument1 ...]
I forgot to include -l in the first line. It should have been
#!/bin/bash -l