shellunixmpibsdrsh

RSH connection refused while running MPI program


I'm trying to run MPI programs on 8 machines, but I get the error

connect to address 127.0.0.1 port 544: Connection refused
Trying krb4 rsh...
connect to address 127.0.0.1 port 544: Connection refused
trying normal rsh (/usr/bin/rsh)
lagrid02: Connection refused

When I run it with a machinefile option, I get the error lagrid03: No route to host where lagrid03 is the neighbouring node connected to master node.

How should I rectify this ?


Solution

  • Regarding your first error, is rsh running on (all) the machine(s)? You'll need rsh or password-less ssh configured (and ask your mpi job launcher use ssh) before you can start jobs on different machines.

    The second error indicates that there is no way to reach the machine lagrid03 with the current network config. I guess you have a /etc/hosts entry with the IP addresses for lagrid03, but you do not have an interface configured in that network. For a more detailed answer you'll need to post details about your network configuration.