[SOLVED] OpenMP offloading target map alloc

OpenMP offloading target map alloc - how does it work

I have always been confused and never understood how the alloc map-type of the map clause of the target (or target data) construct works.

What is my application - I would like to have a temporary array on a device, which is used only on the device, is initialized on the device, read on the device, everything on the device. The host does not touch the contents of the array at all. For the sake of simplicity, I have the following code, which copies an array to another array via a temporary array (using just a single team and thread, but that does not matter):

#include <cstdio>

int main()
{
    const int count = 10;
    int * src = new int[count];
    int * tmp = new int[count];
    int * dst = new int[count];

    for(int i = 0; i < count; i++) src[i] = i;
    for(int i = 0; i < count; i++) printf(" %3d", src[i]); printf("\n");

    #pragma omp target map(to:src[0:count]) map(from:dst[0:count]) map(alloc:tmp[0:count])
    {
        for(int i = 0; i < count; i++) tmp[i] = src[i];
        for(int i = 0; i < count; i++) dst[i] = tmp[i];
    }

    for(int i = 0; i < count; i++) printf(" %3d", dst[i]); printf("\n");

    delete[] src;
    delete[] tmp;
    delete[] dst;

    return 0;
}

This code works when using pgc++ -mp=gpu on Nvidia and on Intel gpu using icpx -fiopenmp -fopenmp-targets=spir64.

But the thing is, I don't want to allocate the tmp array on the host. If I just use int * tmp = nullptr, on nvidia the code fails (on intel it still works). If I leave the tmp uninitialized (using just int * tmp;, and removing the delete), the execution fails on Intel too. If I do not even declare the tmp variable, compilation fails (which kinda makes sense). I made sure it runs on the device (really offloads the code, doesn't fallback to cpu) using OMP_TARGET_OFFLOAD=MANDATORY.

This was weird to me, since I don't use the tmp array on the host at all. As I understand it, the tmp array is allocated on the device and then in the kernel the device array is used. Is that right? Why do I have to allocate and/or initialize the pointer on the host if I don't use it on the host?

So my question is: what are the exact requirements to use map(alloc) in OpenMP offloading? How does it work? How should I use it? I would appreciate an example and references from tutorials/documentation.

I wasn't able to find any useful information regarding this. The standard was not helpful at all, and the tutorials I attended and watched did not go into such depth.

I understand that the code should work even without OpenMP enabled (as if the pragmas were just ignored), so let's assume there is an #ifdef to actually allocate the tmp array if OpenMP is disabled.

I am also aware of manual memory management via omp_target_alloc(), omp_target_memcpy() and omp_target_free(), but I wanted to use the target map(alloc).

I am reading the standard 5.2, using pgc++ 22.2-0 and icpx 2022.0.0.20211123.

Solution

So, the thing is, map(alloc: ...) is not just about allocating memory. It works in a similar way to map(to: ...) and map(from: ...), with the only difference that no data will be copied in any direction. The host array has to be allocated. That naming still seems a little weird to me.

Only the omp_target_alloc() and omp_target_free() functions are able to achieve what I was aiming for.