After Changing to version 1.19, but using Omnipath implementation, I'm randomly receiving the following error: ERROR calling: gasnet_barrier_try(id, 0)
.
I know that the Omnipath implementation of GASNet is no longer supported by the current version of Chapel. However, I would like to use some features available only in version 1.19
, and the cluster I use runs over an Omnipath network.
In order to use the PSM
substrate (OmniPath), I proceed as suggested by Chapel's Gitter community:
export CHPL_GASNET_ALLOW_BAD_SUBSTRATE=true
wget https://gasnet.lbl.gov/download/GASNet-1.32.0.tar.gz
tar xzf GASNet-1.32.0.tar.gz
rm -rf $CHPL_HOME/third-party/gasnet/gasnet-src
mv GASNet-1.32.0 $CHPL_HOME/third-party/gasnet/gasnet-src
Then, I setup other variables:
export CHPL_COMM='gasnet'
export CHPL_LAUNCHER='gasnetrun_psm'
export CHPL_COMM_SUBSTRATE='psm'
export CHPL_GASNET_SEGMENT='everything'
export CHPL_TARGET_CPU='native'
export GASNET_PSM_SPAWNER='ssh'
export HFI_NO_CPUAFFINITY=1
Next, I build the runtime, etc.
However, when I run experiments, I randomly receive the following error:
ERROR calling: gasnet_barrier_try(id, 0)
at: comm-gasnet.c:1020
error: GASNET_ERR_BARRIER_MISMATCH (Barrier id's mismatched)
Which finishes the execution of the program.
I cannot find in GASNet documentation the reason for this error. I could only find a bit of information on GASNet's code.
Do you know what's the cause of this problem?
Thank you all.
I realize this is an old question, but for the record the current version of Chapel (1.28.0) now embeds a version of GASNet (GASNet-EX 2022.3.0 as of this writing) that provides CHPL_COMM=gasnet CHPL_COMM_SUBSTRATE=ofi
(aka GASNet ofi-conduit) that provides high-quality support for Intel Omni-Path.
In particular, there should no longer be any reason to clobber Chapel's embedded version of GASNet-EX with an ancient/outdated GASNet-1 to get Omni-Path support, as suggested in the original question.
For more details see Chapel's detailed Omni-Path instructions.