I have the following problem. I have a main subroutine, let us call it main_function
(for 3D BSplines). It takes as input several tensors.
This function contains only IF-conditions. If a condition is satisfied, other functions are called. Let us call these functions: function_a
, function_b
, and function_c
which are parallelizable.
The structure is as follows
subroutine main_function(paras)
if(1) then
call function_a
else if (2)
call function_b
else if (3)
call function_c
end if
end subroutine main_function
with
subroutine function_a(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_a
subroutine function_b(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_b
subroutine function_c(paras)
!$acc parallel loop present(....)
do
heavy parallel calcs
end do
output: eta
end subroutine function_c
The subroutines function_a
, function_b
, and function_c
have a B-spline tensor (eta
) as an output calculated on GPU. I don't want to move this tensor to the host since it is not needed there. However, after calculating eta
on GPU using main_function
, an interpolation subroutine interpolate3D
is called to interpolate the function. The definition of interpolate3D
is something like
subroutine interpolate3D(eta, x, y, z, fAtxyz)
!$acc routine seq
interpolate ...
end subroutine interpolate3D
To summarize the the pseudo-code is something like
call main_function(paras)
!$acc parallel loop present(x, y, eta, fAtxyz)
do i = 1, N
call interpolate3D(eta, x(i), y(i), z(i), fAtxyz(i))
end do
My problems and questions are:
1)- When I don't use '!$acc update self (eta)
' before the loop, the results are completely wrong. Does this mean that 'present clause
' doesn't find correctly eta
, calculated by main_function
, on GPU. Therefore, one needs to update the host, and then recopy it back to the GPU?
2)- How to ensure that interpolate3D
is working on GPU? For example, if I don't have the above loop, does only adding '!$acc routine seq
' ensure that it works on GPU and searches for different quantities there?
3)- In fact, when there is no loop, adding '!$acc update self (eta)
' is required to have correct results. Does this mean that in this case the subroutine is executed on CPU?
3)- To summarize, If I have two subroutines: the first choses between different subroutines based on if-conditions to calculate a vector or tensor and keep it on GPU (I don't want to update the host), while the second will use this vector to perform some calculations on GPU, how to do this correctly with openACC
?
Sorry for being long and thank you very much for your help,
In fact, I have tried different strategies. However, all of them requires copying eta
to the host before interpolating, even though it is only calculated on the device. There is something I don't understand since I'm also new to openacc
Cross-posted on NVIDIA's Forum: https://forums.developer.nvidia.com/t/b-splines-on-gpus-openacc-fortran/233053
Issue was an error in the user's code where a "parallel loop" was missing, hence the loop was not being run on the host.