I'm new to openacc. I'm trying to use it to accelerate a particle code. However, I don't understand why when updating an array (eta
in the program below) on the host, it gives different results depending on the location of '!$acc update self
'. Here is a code that re-produce this problem:
program approximateFun
use funs
use paras_mod
integer :: Nx
real(dp) :: dx
real(dp), dimension(:), allocatable :: x, eta
real(dp), dimension(5) :: xp, fAtxp
!$acc declare create(Nx)
!$acc declare create(x)
!$acc declare create(eta)
!$acc declare create(dx)
!$acc declare create(fAtxp)
!$acc declare create(xp)
Nx = 16
!$acc update device(Nx)
xp = (/3.9, 4.1, 4.5, 5.0, 5.6/)
!$acc update device(xp)
allocate(x(1 : Nx))
allocate(eta(1 : Nx))
eta = 0.0d0
dx = 2 * pi / (Nx - 1)
!$acc update device(dx)
do i = 1, Nx
x(i) = (i - 1.0d0) * dx
end do
!$acc update device(x)
call calc_etaVec(x, Nx, eta)
!$acc update self(eta) ! gives the correct results
!$acc parallel loop present(dx, xp, eta, fAtxp)
do i = 1, 5
call calcFunAtx(xp(i), dx, eta, fAtxp(i))
end do
!$acc update self (fAtxp)
!!$acc update self(eta) !---> gives wrong result
write(6, *) 'eta', eta
do i = 1, 5
write(6, *) 'xp, fAtxp', xp(i), fAtxp(i)
end do
deallocate(x)
deallocate(eta)
end program approximateFun
The previous program uses the following modules
MODULE funs
use paras_mod
implicit none
CONTAINS
subroutine calc_etaVec(x, nx, eta)
integer, intent(in) :: nx
real(dp), dimension(:), intent(in) :: x
real(dp), dimension(:), intent(out) :: eta
integer :: i
!$acc parallel loop present(x, eta)
do i = 1, nx
eta(i) = sin(x(i))
end do
end subroutine
subroutine calcFunAtx(xp, dx, eta, fAtx)
real(dp), intent(in) :: xp, dx
real(dp), dimension(:), intent(in) :: eta
real(dp), intent(out) :: fAtx
integer :: idx
!$acc routine seq
idx = 1 + floor(xp / dx)
fAtx = eta(idx)
end subroutine calcFunAtx
END MODULE
and
module paras_mod
implicit none
save
INTEGER, PARAMETER :: dp = selected_real_kind(14,300)
REAL(dp), PARAMETER :: PI=4.0d0*atan(1.0d0)
end module paras_mod
When using !$acc update self(eta)
directly after call calc_etaVec(x, Nx, eta)
, eta
is updated correctly. But when used after the loop, only the first five elements are correct, while the remaining are zeros. What are the reasons behind that?
thanks
The output when !$acc update self(eta)
is used directly after call calc_etaVec(x, Nx, eta)
is
0.000000000000000 0.4067366430758002
0.7431448254773941 0.9510565162951535 0.9945218953682734
0.8660254037844388 0.5877852522924732 0.2079116908177597
-0.2079116908177591 -0.5877852522924730 -0.8660254037844384
-0.9945218953682732 -0.9510565162951536 -0.7431448254773946
-0.4067366430758009 -2.4492935982947064E-016
which is correct. However, when used after the loop, the output is
0.000000000000000 0.4067366430758002
0.7431448254773941 0.9510565162951535 0.9945218953682734
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000
This one was perplexing till I determined that its a compiler error. I've filed a problem report, TPR #32673, and sent it to engineering for review.
When setting the environment variable NV_ACC_NOTIFY=2, which shows the data movement, I see that the compiler is only copying 40 bytes, versus the correct 128. However, if I remove "eta" from the preceding present clause, then it's correct.
#ifdef WORKS
!$acc parallel loop present(dx, fAtxp)
#else
!$acc parallel loop present(dx, fAtxp, eta)
#endif
do i = 1, 5
call calcFunAtx(xp(i), dx, eta, fAtxp(i))
end do
Also, this only occurs when using an allocatable in a declare create directive. If you switched to using data regions, the issue doesn't occur. Probably why we didn't see it before (looks like the bugs been there since mid-2020). Using the declare directive in anything but a module is rare.