parallel-processingfortranconsistencyfortran-coarrays

When and where are writes to coarrays visible in Fortran?


I have this program, and would expect this to print 1 2 when run with 2 images. However, it prints 1 1 on one image, and 1 2 on the other.

program main
    implicit none
    double precision, allocatable :: a[:]

    allocate(a[*])

    a = this_image()
    sync all
    write(*, *) this_image(), a[1], a[2]

    deallocate(a)
end program main

It is compiled with gfortran -fcoarray=lib minimal.f90 -lcaf_mpich and run with mpirun.mpich -n 2 ./a.out

I am using gfortran 12.2.0, and OpenCoarrays version 2.10.1 with MPICH 4.0.2

Exact output is

           1   1.0000000000000000        1.0000000000000000
           2   1.0000000000000000        2.0000000000000000
[1689752680.508753] [thomas-laptop:14602:0]       tag_match.c:62   UCX  WARN  unexpected tag-receive descriptor 0x55b38d7fb8c0 was not matched
[1689752680.509388] [thomas-laptop:14601:0]       tag_match.c:62   UCX  WARN  unexpected tag-receive descriptor 0x564edbce58c0 was not matched

Solution

  • Your reasoning for the expected output is correct, and it appears that there's some mishap with the toolchain which is responsible for the incorrect result appearing. Because this is a reasonably minimal case, though, it can be educational to look at the formal statement of the result.

    The program of the question consists of two segments:

    A write to a variable is visible to a read from that variable if the write "precedes" the read. In the case of the question, the write (a=this_image()) precedes the read (write(*,*) this_image(), a[1], a[2]) on each image.

    "Preceding" is precisely defined in terms of segment orders, which in this case we can state as:

    (with appropriate "after" ordering).

    Each image defines a in its own image in segment 1; each reference to a (on any image) happens in segment 2. Segment 2 in each image is ordered after each segment 1: the definition of a on each image is "safe".

    (If we remove the sync all we have just one segment, and definition of a[1] on image 1 doesn't formally precede the reference of a[1] on image 2, even though it may so happen in practice. This meaning of "precede" is what we consider in relation to data races in other contexts.)