I'm trying to run a multilocale Chapel code on a cluster that has an MXM Infiniband network(40 Gbps, model: Mellanox Technologies MT26428).
I followed both Chapel and GASNet documentations, and I set
export CHPL_COMM_SUBSTRATE=ibv
export CHPL_LAUNCHER=gasnetrun_ibv
export GASNET_IBV_SPAWNER=mpi
instead of using CHPL_COMM_SUBSTRATE=mxm
, once mxm is deprecated.
The problem is that I can build Chapel using the ibv substrate. But, I cannot run on multiple locales. I receive a huge number of timeout errors.
In the first place, I thought the problem was the PKEY. So, I added "--mca btl_openib_pkey "0x8100""
to the MPIRUN_CMD
. But, no success.
I also tried to use the deprecated mxm configuration:
CHPL_LAUNCHER=gasnetrun_mxm
export CHPL_LAUNCHER=gasnetrun_ibv
export GASNET_MXM_SPAWNER=mpi
However, I cannot build Chapel with such a configuration. That's the error message:
"User requested --enable-mxm, but I don't know how to build mxm programs for your system."
By the way, using GASNET on top of MPI, UDP, and Infiniband without a Partition Key works just fine.
Does anybody know how to use Chapel on a Cluster equipped with an MXM Infiniband network and Partition Key (PKEY)?
Best Regards,
Tiago Carneiro.
Tiago,
As the author and maintainer of GASNet's ibv-conduit (support for libibverbs) I can tell you that we have never had support for a non-default PKey. The message *** FATAL ERROR: failed to connect (snd) status=12
is consistent with use of the wrong PKey.
Based on your question here, I have made an attempt to provide support for a user-specified PKey. You can find my prototype as a pull-request in the GASNet git repository at Bitbucket: https://bitbucket.org/berkeleylab/gasnet/pull-requests/248 (or https://bitbucket.org/PHHargrove/gasnet-public/commits/ibv-pkey/raw to get just a raw patch). You should be able to apply the one commit in that PR in the third-party/gasnet/gasnet-src
directory of the Chapel source. I don't have a partitioned IB network to test on. So, you would be helping me out if you can verify this resolves your problem.
Regarding User requested --enable-mxm, but I don't know how to build mxm programs for your system
, I suspect that GASNet's configure probe was unable to find the necessary headers or libraries. Details of the failure should be in a config.log
file below third-party/gasnet/build
. If your mxm headers and libs are installed in a location other than /opt/mellanox/mxm
then you can set the environment variable MXM_HOME
when building Chapel, to inform GASNet's configure script of the actual location. However, I am not aware of any PKey support in libmxm. So, this might be a dead end.
-Paul