I am trying to offload several nested for loops in fortran using OpenMP, XL compiler suite. 90% of the routines are straight forward, but a handful of the loops involve private 1D arrays that are of unknown size at compile time, but will always be on ~O(10), which is very manageable in terms of thread stack memory. Here is an example loop
implicit none
real, dimension(1:nseq) :: yy !nseq is a global variable, usually 1-10,
!$omp target teams distribute parallel do collapse(3) schedule (static,1) &
!$omp& private(i, j, k) &
!$omp& private( yy )&
!omp& shared( ne )
do k=1,30
do j=1,30
do i=1,30
yy = dummy_array(i,j,k,6:ne) ! nseq is equal to ne-6... ne is a global variable
! dummy_array is an allocatable array that exists persistently on
! the GPU
...
do stuff with yy
...
end do
end do
end do
With this standard method, I get a lot of memory issues, varying between "out of memory errors" and "an illegal memory access was encountered"
If I go in and hard code what I know the values of nseq will be ahead of time, i.e.
implicit none
real, dimension(1:10) :: yy
Then I have no issues at all, so I am not ACTUALLY running out of memory on the GPU. This is obviously bad practice as these values change from case to case and are run time parameters.
I have experimented with ENV variables such as OMP_HEAPSIZE and OMP_STACKSIZE with no luck.
Thanks for taking a look!
This turns out to be a compiler quirk/bug with the IBM XL compiler suite I was using.
A workaround that is not very desirable, but is effective, is to manually privitize the temporary array, i.e.
real, dimension(30,30,30,1:nseq) :: yy
!$omp target teams distribute parallel do collapse(3) schedule (static,1) &
!$omp& private(i, j, k) &
!$omp& private( yy )&
!omp& shared( ne )
do k=1,30
do j=1,30
do i=1,30
yy(i,j,k,:) = dummy_array(i,j,k,6:ne)
...
do stuff with yy
...
end do
end do
end do