mpichapelgasnet

Chapel - Problems With Multilocale Configuration of GASNET MPI substrate


I have a forall code with distributed iterators in Chapel and I'm trying to run it on a Cluster.

The code runs perfectly when using the UDP conduit.

Now, I'm trying to use the portable MPI as an internal Layer - with no success.

Here is my configuration:

export CHPL_TASKS=qthreads

export CHPL_COMM=gasnet

export CHPL_COMM_SUBSTRATE=mpi

export CHPL_LAUNCHER=gasnetrun_mpi

with only this configuration only one node was used. Looking at Gasnet documentation, I added:

export GASNET_NODEFILE="$(pwd)"/nodes

export MPIRUN_CMD='mpirun -np %N -machinefile %H %C'

(these details are missing in the official documentation).

Ok, now I can run Chapel code using MPI. BUT:

1) Each node has 32 cores. If I put hello6 -nl x, x < 33, all processes are executed by the first locale.

1.1) I would like to run hello6 -nl 4, so each node would say hello from locale x, adress x.address.

2) Looks like Chapel uses the $OAR_NODEFILE (maybe another) to create the Locales vector, because this OAR_NODEFILE has one entry per core for each node.

3) However, even if I change manually both $GASNET_NODEFILE and $OAR_NODEFILE the Locale vector still contains one entry per core for each CPU node.

4) In the cluster I have access, I run mpi codes like this: mpirun -machinefile $OAR_NODEFILE ~/program. However, GASNET requires the syntax of the last variable exported.

Can anyone help me configuring the runtime for executing my code on multiple Locales?

Best regards,

Tiago Carneiro.


Solution

  • Assuming you're using the Chapel 1.18 release and Open MPI (let me know if that's not true.) There was a bug in Chapel 1.18 and earlier where when using Open MPI all Chapel instances were packed onto a single node first. This has been fixed on master (https://github.com/chapel-lang/chapel/pull/11546) and the fix will be included in the 1.19 release.

    You could try using git master, or you might be able to set MPIRUN_CMD="mpirun --bind-to none --map-by ppr:1:node -np %N %P %A" as a workaround.