c++openmpiboost-mpi

Unable to run OpenMPI across more than two machines


When attempting to run the first example in the boost::mpi tutorial, I was unable to run across more than two machines. Specifically, this seemed to run fine:

mpirun -hostfile hostnames -np 4 boost1

with each hostname in hostnames as <node_name> slots=2 max_slots=2. But, when I increase the number of processes to 5, it just hangs. I have decreased the number of slots/max_slots to 1 with the same result when I exceed 2 machines. On the nodes, this shows up in the job list:

<user> Ss orted --daemonize -mca ess env -mca orte_ess_jobid 388497408 \
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 -hnp-uri \
388497408.0;tcp://<node_ip>:48823

Additionally, when I kill it, I get this message:

node2- daemon did not report back when launched
node3- daemon did not report back when launched

The cluster is set up with the mpi and boost libs accessible on an NFS mounted drive. Am I running into a deadlock with NFS? Or, is something else going on?

Update: To be clear, the boost program I am running is

#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <iostream>
namespace mpi = boost::mpi;

int main(int argc, char* argv[]) 
{
  mpi::environment env(argc, argv);
  mpi::communicator world;
  std::cout << "I am process " << world.rank() << " of " << world.size()
        << "." << std::endl;
  return 0;
}

From @Dirk Eddelbuettel's recommendations, I compiled and ran the mpi example hello_c.c, as follows

#include <stdio.h>
#include "mpi.h"

int main(int argc, char* argv[])
{
    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("Hello, world, I am %d of %d\n", rank, size);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();

   return 0;
}

It runs fine on a single machine with multiple processes, this includes sshing into any of the nodes and running. Each compute node is identical with the working and mpi/boost directories mounted from a remote machine via NFS. When running the boost program from the fileserver (identical to a node except boost/mpi are local), I am able to run on two remote nodes. For "hello world", however, running the command mpirun -H node1,node2 -np 12 ./hello I get

[<node name>][[2771,1],<process #>] \
[btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] \
connect() to <node-ip> failed: No route to host (113)

while the all of the "Hello World's" are printed and it hangs at the end. However, the behavior when running from a compute node on a remote node differs.

Both "Hello world" and the boost code just hang with mpirun -H node1 -np 12 ./hello when run from node2 and vice versa. (Hang in the same sense as above: orted is running on remote machine, but not communicating back.)

The fact that the behavior differs from running on the fileserver where the mpi libs are local versus on a compute node suggests that I may be running into an NFS deadlock. Is this a reasonable conclusion? Assuming that this is the case, how do I configure mpi to allow me to link it statically? Additionally, I don't know what to make of the error I get when running from the fileserver, any thoughts?


Solution

  • The answer turned out to be simple: open mpi authenticated via ssh and then opened up tcp/ip sockets between the nodes. The firewalls on the compute nodes were set up to only accept ssh connections from each other, not arbitrary connections. So, after updating iptables, hello world runs like a champ across all of the nodes.

    Edit: It should be pointed out that the fileserver's firewall allowed arbitrary connections, so that was why an mpi program run on it would behave differently than just running on the compute nodes.