chapelinfinibandgasnet

How to Configure Chapel/GASNet for running multilocale codes on MXM Infiniband network with Partition Key?


I'm trying to run a multilocale Chapel code on a cluster that has an MXM Infiniband network(40 Gbps, model: Mellanox Technologies MT26428).

I followed both Chapel and GASNet documentations, and I set

export CHPL_COMM_SUBSTRATE=ibv

export CHPL_LAUNCHER=gasnetrun_ibv

export GASNET_IBV_SPAWNER=mpi

instead of using CHPL_COMM_SUBSTRATE=mxm, once mxm is deprecated.

The problem is that I can build Chapel using the ibv substrate. But, I cannot run on multiple locales. I receive a huge number of timeout errors.

In the first place, I thought the problem was the PKEY. So, I added "--mca btl_openib_pkey "0x8100"" to the MPIRUN_CMD. But, no success.

I also tried to use the deprecated mxm configuration:

CHPL_LAUNCHER=gasnetrun_mxm

export CHPL_LAUNCHER=gasnetrun_ibv

export GASNET_MXM_SPAWNER=mpi

However, I cannot build Chapel with such a configuration. That's the error message:

"User requested --enable-mxm, but I don't know how to build mxm programs for your system."

By the way, using GASNET on top of MPI, UDP, and Infiniband without a Partition Key works just fine.

Does anybody know how to use Chapel on a Cluster equipped with an MXM Infiniband network and Partition Key (PKEY)?

Best Regards,

Tiago Carneiro.


Solution

  • Tiago,

    As the author and maintainer of GASNet's ibv-conduit (support for libibverbs) I can tell you that we have never had support for a non-default PKey. The message *** FATAL ERROR: failed to connect (snd) status=12 is consistent with use of the wrong PKey.

    Based on your question here, I have made an attempt to provide support for a user-specified PKey. You can find my prototype as a pull-request in the GASNet git repository at Bitbucket: https://bitbucket.org/berkeleylab/gasnet/pull-requests/248 (or https://bitbucket.org/PHHargrove/gasnet-public/commits/ibv-pkey/raw to get just a raw patch). You should be able to apply the one commit in that PR in the third-party/gasnet/gasnet-src directory of the Chapel source. I don't have a partitioned IB network to test on. So, you would be helping me out if you can verify this resolves your problem.

    Regarding User requested --enable-mxm, but I don't know how to build mxm programs for your system, I suspect that GASNet's configure probe was unable to find the necessary headers or libraries. Details of the failure should be in a config.log file below third-party/gasnet/build. If your mxm headers and libs are installed in a location other than /opt/mellanox/mxm then you can set the environment variable MXM_HOME when building Chapel, to inform GASNet's configure script of the actual location. However, I am not aware of any PKey support in libmxm. So, this might be a dead end.

    -Paul