A strange phenomenon occurs in the following Coarray code
program strange
implicit none
integer :: counter = 0
logical :: co_missionAccomplished[*]
co_missionAccomplished = .false.
sync all
do
if (this_image()==1) then
counter = counter+1
if (counter==2) co_missionAccomplished = .true.
sync images(*)
else
sync images(1)
end if
if (co_missionAccomplished[1]) exit
cycle
end do
write(*,*) "missionAccomplished on image ", this_image()
end program strange
This program never ends, as it appears that there is a deadlock for any counter threshold beyond 1 inside the loop. The code is compiled with Intel Fortran 2018 Windows OS, with the following flags:
ifort /debug /Qcoarray=shared /standard-semantics /traceback /gen-interfaces /check /fpe:0 normal.f90 -o run.exe
The same code, using DO WHILE construct, also appears to suffer from the same phenomenon:
program strange
implicit none
integer :: counter = 0
logical :: co_missionAccomplished[*]
co_missionAccomplished = .true.
sync all
do while(co_missionAccomplished[1])
if (this_image()==1) then
counter = counter+1
if (counter==2) co_missionAccomplished = .false.
sync images(*)
else
sync images(1)
end if
end do
write(*,*) "missionAccomplished on image ", this_image()
end program strange
This seems now too trivial to be a compiler bug, so I am probably missing something important about do-loops in parallel. any help is appreciated.
UPDATE:
Adding a SYNC ALL statement before the CYCLE statement in the DO-CYCLE-EXIT example program above resolves the deadlock. Also, a SYNC ALL statement right after DO WHILE statement, as the first line of the block resolves the deadlock. So apparently, all the images must be synced to avoid a deadlock before each cycle of the loops in either case above.
Regarding "This seems now too trivial to be a compiler bug", you may be very surprised about how seemingly trivial things can be treated incorrectly by a compiler. Few things relating to coarrays are trivial.
Consider the following program which is related:
implicit none
integer i[*]
do i=1,1
sync all
print '(I1)', i[1]
end do
end
I get the initially surprising output
1
2
when run with two images under ifort 2018.1.
Let's look at what's going on.
In my loop, i[1]
first has value 1
when the images are synchronized. However, by the time the second image accesses the value, it's been changed by the first image ending its iteration.
We solve that little problem by putting an extra synchronization statement before the end do
.
How is this program related to the one of the question? It's the same lack of synchronization between testing a value on a remote image and that image updating it.
Between the synchronization and other images testing the value of co_missionAccomplished[1]
, the first image may dash around and update counter
, then co_missionAccomplished
. Some images may see the exit state in their first iteration.