mpicluster-computingsungridenginerocks

MPI job on Rocks cluster (SGE scheduler) doesn't run over multiple nodes


I'm trying to run a parallel MPI job using the Sun Grid Engine scheduler on a Rocks v5.4.3 cluster. The cluster has a queue named "all.q" that has 22 compute nodes: 21 of which have 8 CPUs and 1 which has 4 CPUs. When a parallel job runs, however, all of the tasks it creates are confined to a single node.

For example, if I request 16 CPUs (tasks) in a job submission script and submit the job to the scheduler using qsub, the job starts successfully but all of the 16 tasks get started on a single node (the first assigned node), instead of being distributed among the nodes assigned to the job by the scheduler.

The job submission script for this test case is as follows:

#!/bin/bash
#$ -N test
#$ -cwd
#$ -pe mpi 16
#$ -S /bin/bash
#$ -q all.q
#$ -e $JOB_NAME.e$JOB_ID
#$ -o $JOB_NAME.o$JOB_ID

lammps=/home/Brian/lammps/lmp_openmpi

/opt/intel/openmpi-1.4.4/bin/mpirun -machinefile $TMPDIR/machines \
-np $NSLOTS $lammps -in in.melt > job.log

The output file from the scheduler shows that the job tasks gets assigned to the following nodes:

compute-1-14
compute-1-14
compute-1-14
compute-1-14
compute-1-14
compute-1-14
compute-1-14
compute-1-14
compute-1-16
compute-1-16
compute-1-16
compute-1-16
compute-1-16
compute-1-16
compute-1-16
compute-1-16

However, if I ssh into compute-1-14 and run top and grep the lmp_openmpi processes, I get the following:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21762 Brian 25 0 253m 87m 5396 R 99.1 0.5 2:19.60 lmp_openmpi 
21761 Brian 25 0 253m 87m 5508 R 73.3 0.5 1:50.14 lmp_openmpi 
21759 Brian 25 0 253m 87m 5804 R 71.3 0.5 1:55.38 lmp_openmpi 
21760 Brian 25 0 253m 87m 5512 R 71.3 0.5 1:36.27 lmp_openmpi 
21765 Brian 25 0 253m 87m 5324 R 61.4 0.5 1:53.11 lmp_openmpi 
21763 Brian 25 0 253m 87m 5496 R 59.5 0.5 1:53.14 lmp_openmpi 
21770 Brian 25 0 253m 87m 5308 R 59.5 0.5 1:45.21 lmp_openmpi 
21767 Brian 25 0 253m 87m 5504 R 57.5 0.5 1:58.65 lmp_openmpi 
21772 Brian 25 0 253m 87m 5304 R 43.6 0.5 1:52.24 lmp_openmpi 
21771 Brian 25 0 253m 87m 5268 R 39.6 0.5 1:51.23 lmp_openmpi 
21773 Brian 25 0 253m 87m 5252 R 39.6 0.5 1:52.02 lmp_openmpi 
21774 Brian 25 0 253m 87m 5228 R 39.6 0.5 1:47.85 lmp_openmpi 
21766 Brian 25 0 253m 87m 5332 R 29.7 0.5 1:51.18 lmp_openmpi 
21764 Brian 25 0 253m 87m 5356 R 27.7 0.5 2:09.05 lmp_openmpi 
21768 Brian 25 0 253m 87m 5356 R 21.8 0.5 1:35.28 lmp_openmpi 
21769 Brian 25 0 253m 87m 5324 R  7.9 0.5 1:50.63 lmp_openmpi 

which is 16 processes, while running top on compute-1-16 shows no lmp_openmpi processes.

I'm not sure how thoroughly I explained the issue, so if any more info is needed please let me know. I'm also a newbie with Rocks and SGE, so hopefully my example is sufficiently clear. If not, I will modify. Thanks to all in advance.


Solution

  • Problem: issue with the build of openMPI on our cluster.

    Solution: installing newest version of Intel compilers v16.0.3 and Intel MPI v5.1.3 which solved the multi-node problem.