fortranmpiintel-fortranintel-mpi

Seg fault in fortran MPI_COMM_CREATE_GROUP if using a group not directly created from MPI_COMM_WORLD


I'm having a segmentation fault that I can not really understand in a simple code, that just:

Specifically I use this last call, instead of just using MPI_COMM_CREATE, because it's only collective over the group of processes contained in group, while MPI_COMM_CREATE is collective over every process in COMM. The code is the following

program mpi_comm_create_grp
  use mpi
  IMPLICIT NONE

  INTEGER :: mpi_size,  mpi_err_code
  INTEGER :: my_comm_dup, mpi_new_comm, mpi_group_world, mpi_new_group
  INTEGER :: rank_index
  INTEGER, DIMENSION(:), ALLOCATABLE :: rank_vec

  CALL mpi_init(mpi_err_code)
  CALL mpi_comm_size(mpi_comm_world, mpi_size, mpi_err_code)

  !! allocate and fill the vector for the new group
  allocate(rank_vec(mpi_size/2))
  rank_vec(:) = (/ (rank_index , rank_index=0, mpi_size/2) /)

  !! create the group directly from the comm_world: this way works
  ! CALL mpi_comm_group(mpi_comm_world, mpi_group_world, mpi_err_code)

  !! duplicating the comm_world creating the group form the dup: this ways fails
  CALL mpi_comm_dup(mpi_comm_world, my_comm_dup, mpi_err_code)
  !! creatig the group of all processes from the duplicated comm_world
  CALL mpi_comm_group(my_comm_dup, mpi_group_world, mpi_err_code)

  !! create a new group with just half of processes in comm_world
  CALL mpi_group_incl(mpi_group_world, mpi_size/2, rank_vec,mpi_new_group, mpi_err_code)

  !! create a new comm from the comm_world using the new group created
  CALL mpi_comm_create_group(mpi_comm_world, mpi_new_group, 0, mpi_new_comm, mpi_err_code)

  !! deallocate and finalize mpi
  if(ALLOCATED(rank_vec)) DEALLOCATE(rank_vec)
  CALL mpi_finalize(mpi_err_code)
end program !mpi_comm_create_grp

If instead of duplicating the COMM_WORLD, I directly create the group from the global communicator (commented line), everything works just fine.

The parallel debugger I'm using traces back the seg fault to a call to MPI_GROUP_TRANSLATE_RANKS, but, as far as I know, the MPI_COMM_DUP duplicates all the attributes of the copied communicator, ranks numbering included.

I am using the ifort version 18.0.5, but I also tried with the 17.0.4, and 19.0.2 with no better results.


Solution

  • Like suggested in the comments I wrote to openmpi user list, and they replied

    That is perfectly valid. The MPI processes that make up the group are all part of comm world. I would file a bug with Intel MPI.

    So I try and post a question on Intel forum. It is a bug they solved in the last version of the libray, 19.3.