fortranopenaccpgi-accelerator

How can Fortran-OpenACC contained subroutine access data from parent subroutine


I am currently accelerating a Fortran code where a contained subroutine (subsub) accesses and modifies variables declared in the parent subroutine (sub):

module mod
  implicit none
contains
  subroutine sub
    integer :: var(10)
    integer :: i

    !$acc kernels loop
    do i = 1, 10
      call subsub
    enddo
  contains
    subroutine subsub
      !$acc routine
      var(i) = i
    endsubroutine
  endsubroutine
endmodule

program test
  use mod
  call sub
endprogram

When compiling with the PGI compiler version 20.9-0, it complains that subsub cannot refer to the host variable var:

sub:
      8, Generating implicit copy(.S0000) [if not already present]
      9, Loop is parallelizable
         Generating Tesla code
          9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
  0 inform,   0 warnings,   1 severes, 0 fatal for subsub

Which makes sense. I tried to create var on the device with acc data create(var) or acc declare create(var), but it does not change the outcome.

Can this pattern be accelerated at all?


Solution

  • No, this pattern wont work. For contained routines, the compiler passes a hidden argument to the parent's stack pointer. In this case, the stack pointer would be to the host, which will cause problems when trying to access it from the device.

    The work around would be to pass in the variables to the subroutine. For example:

    % cat test2.f90
    module mod
      implicit none
    contains
      subroutine sub
        integer :: var(10)
        integer :: i
    
        !$acc kernels loop
        do i = 1, 10
          call subsub(var,i)
        enddo
        print *, var
      contains
        subroutine subsub(var,i)
          !$acc routine
        integer :: var(10)
        integer, value :: i
          var(i) = i
        endsubroutine
      endsubroutine
    endmodule
    
    program test
      use mod
      call sub
    endprogram
    % nvfortran test2.f90 -acc -Minfo=accel ; a.out
    sub:
          8, Generating implicit copy(.S0000,var(:)) [if not already present]
          9, Loop is parallelizable
             Generating Tesla code
              9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
    subsub:
         14, Generating acc routine seq
             Generating Tesla code
                1            2            3            4            5            6
                7            8            9           10