distributed-computingchapelgasnet

Problems when using Chapel 1.19 along with GASNet PSM (OmniPath) substrate


After Changing to version 1.19, but using Omnipath implementation, I'm randomly receiving the following error: ERROR calling: gasnet_barrier_try(id, 0).

I know that the Omnipath implementation of GASNet is no longer supported by the current version of Chapel. However, I would like to use some features available only in version 1.19, and the cluster I use runs over an Omnipath network.

In order to use the PSM substrate (OmniPath), I proceed as suggested by Chapel's Gitter community:

export CHPL_GASNET_ALLOW_BAD_SUBSTRATE=true

wget https://gasnet.lbl.gov/download/GASNet-1.32.0.tar.gz

tar xzf GASNet-1.32.0.tar.gz

rm -rf $CHPL_HOME/third-party/gasnet/gasnet-src

mv GASNet-1.32.0 $CHPL_HOME/third-party/gasnet/gasnet-src

Then, I setup other variables:

export CHPL_COMM='gasnet' export CHPL_LAUNCHER='gasnetrun_psm' export CHPL_COMM_SUBSTRATE='psm' export CHPL_GASNET_SEGMENT='everything' export CHPL_TARGET_CPU='native' export GASNET_PSM_SPAWNER='ssh' export HFI_NO_CPUAFFINITY=1

Next, I build the runtime, etc.

However, when I run experiments, I randomly receive the following error:

ERROR calling: gasnet_barrier_try(id, 0) at: comm-gasnet.c:1020 error: GASNET_ERR_BARRIER_MISMATCH (Barrier id's mismatched)

Which finishes the execution of the program.

I cannot find in GASNet documentation the reason for this error. I could only find a bit of information on GASNet's code.

Do you know what's the cause of this problem?

Thank you all.


Solution

  • I realize this is an old question, but for the record the current version of Chapel (1.28.0) now embeds a version of GASNet (GASNet-EX 2022.3.0 as of this writing) that provides CHPL_COMM=gasnet CHPL_COMM_SUBSTRATE=ofi (aka GASNet ofi-conduit) that provides high-quality support for Intel Omni-Path.

    In particular, there should no longer be any reason to clobber Chapel's embedded version of GASNet-EX with an ancient/outdated GASNet-1 to get Omni-Path support, as suggested in the original question.

    For more details see Chapel's detailed Omni-Path instructions.