ccudaopenaccgpu-shared-memory

Use of shared memory with OpenACC


I'm trying to use shared memory to cache things with OpenACC.

Basically what I'm working on is a matrix multiplication, and what I have is this:

typedef float ff; 

// Multiplies two square row-major matrices a and b, puts the result in c. 
void mmul(const restrict ff* a, 
          const restrict ff* b, 
          restrict ff* c, 
          const int n) { 
#pragma acc data copyin(a[0:n*n], b[0:n*n]) copy(c[0:n*n]) 
{ 

    #pragma acc region 
    { 

        #pragma acc loop independent vector(16) 
        for (int i = 0; i < n; ++i) { 
            #pragma acc loop independent vector(16) 
            for (int j = 0; j < n; ++j) { 
                ff sum = 0; 
                    for (int k = 0; k < n; ++k) { 
                        sum += a[i + n * k] * b[k + n * j]; 
                    } 
                    c[i + n * j] = sum; 
                } 
            } 
        } 
    }
}

What I would like to do is use shared memory to cache tiles of the matrices 'a' and 'b' to use in the computation of 'c', in a similar fashion to what the CUDA mmul algorithm does.

Basically in CUDA I would know the exact size of my blocks, and would be able to:

I understand I can use the

#pragma acc cached

directive, and that I can specify block sizes with the vector and gang options, but I'm having some trouble understanding how that's gonna be mapped to the CUDA architecture.

Is there a way to achieve something similar with OpenACC? Is there a good tutorial/resource on the use of the cached directive or on how to map some of the power of shared memory from CUDA to OpenACC?


Solution

  • If you are using PGI Accelerator Compiler, you can dump out the generated PTX file and see what is going on in underling of execution:

    pgcc -acc -fast -Minfo -ta=nvidia,cc13,keepptx matrixMult.c -o matrixMult
    

    The generated PTX will be stored in the current directory.

    EDIT: You may prefer to see the high-level code (CUDA for C or Fortran). So use following -ta=nvidia,cc13,keepptx,keepgpu .