timeoutmpiupc

MPI error due to Timeout in making connection to a remote process


I'm trying to run a NAS-UPC benchmark to study it's profile. UPC uses MPI to communicate with remote processes .

When I run the benchmark with 64 processes , i get the following error

upcrun -n 64 bt.C.64
"Timeout in making connection to remote process on <<machine name>>" 

Can anybody tell me why this error occurs ?


Solution

  • this probably means that you're failing to spawn the remote processes - upcrun delegates that to a per-conduit mechanism, which may involve your scheduler (if any). my guess is that you're depending on ssh-type remote access, and that's failing, probably because you don't have keys, agent or host-based trust set up. can you ssh to your remote nodes without password? sane environment on the remote nodes (paths, etc)?

    "upcrun -v" may illuminate the problem, even without resorting to the man page ;)