juliampiopenmpipbstorque

“unable to find the specified executable file” when trying to use mpirun on julia


I am trying to run my julia code on multiple nodes of a cluster, which uses Moab and Torque for the scheduler and resource manager. In an interactive session where I requested 3 nodes, I load julia and openmpi modules and run:

mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=.  "./estimation/test.jl"

The mpirun does successfully recognize my 3 nodes since it displays:


======================   ALLOCATED NODES   ======================
        comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
        comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
        comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================

However, after that it returns an error message

--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 48; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
      line parameter option (remember that mpirun interprets the first
      unrecognized command line token as the executable).

Node:       comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------

What could be the possible cause of this? Is it because it has trouble accessing julia from other nodes? (I think this is the case because the code runs as long as -np X where x <= 24, which is the number of slots for one node; as soon as x >= 25, it fails to run)


Solution

  • Here a good manual how to work with modules and mpirun. UsingMPIstacksWithModules

    To sum it up with what is written in the manual:

    It should be highlighted that modules are nothing else than a structured way to manage your environment variables; so, whatever hurdles there are about modules, apply equally well about environment variables.

    What you need is to export the environment variables in your mpirun command with -x PATH -x LD_LIBRARY_PATH. To see if this worked you can then run

    mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation -x PATH -x LD_LIBRARY_PATH which julia
    

    Also, you should consider giving the whole path of the file you want to run, so /path/to/estimation/test.jl instead of ./estimation/test.jl since your working directory is not the same in every node. (In general it is always safer to use whole paths). By using whole paths, you should also be able to use /path/to/julia (that is the output of which julia) instead of only julia, this way you should not need to export the environment variables.