How do I Implement a Custom 4-Dimensional Array Viewer/Wrapper in SYCL 2020 / DPC++?

In conventional C++, it's possible to create a multi-dimensional "viewer" or "wrapper" to a 1D buffer in linear memory by (1) defining a custom ArrayWrapper class, (2) overriding the () or [] operator as its "accessor", doing the address calculation inside this member function, and (3) returning a reference to the value. Thus, a 4D array can be accessed via the syntactic sugar array(a, b, c, d) = val. This improves code readability, and also decouples the viewer from the actual memory layout of the array.

#include <iostream>
#include <cstdlib>

template <typename T>
class ArrayWrapper
{
public:
        ArrayWrapper(T *buf) : array(buf) {};

        inline T& operator() (size_t a, size_t b, size_t c, size_t d)
        {
                return array[a + b + c + d];
        }

        const inline T& operator() (size_t a, size_t b, size_t c, size_t d) const
        {
                return array[a + b + c + d];
        }
        T *array;
};

int main(void)
{
        int *buf = (int *) malloc(sizeof(int) * 100);
        ArrayWrapper<int> array(buf);
        array(1, 2, 3, 4) = 42;

        std::cout << array(1, 2, 3, 4) << std::endl;
}

However, this wrapper is not usable in a DPC++ / SYCL 2020 kernel.

int main(void)
{
        sycl::queue Q;
        auto buf = sycl::malloc_shared<int>(20, Q);
        ArrayWrapper<int> array(buf);

        Q.single_task([=]() {
                array(1, 2, 3, 4) = 42;
        });
        Q.wait();

        std::cout << array(1, 2, 3, 4) << std::endl;
}

Compiling this function with Intel DPC++ compiler returns the following error:

question-sycl.cpp:37:21: error: expression is not assignable
                array(1, 2, 3, 4) = 42;
                ~~~~~~~~~~~~~~~~~ ^
1 error generated.
make: *** [Makefile:8: question-sycl.elf] Error 1

This is the result due to the use of C++ lambda function, which "captures" variable outside its scope as const variables by default. In conventional C++, this can be solved by either explicitly asking the lambda function to capture a reference Q.single_task([&array]() {}, or declaring the lambda as a mutable function Q.single_task([=]() mutable {}. However, both usages appear to be unsupported in SYCL and prohibited by the DPC++ compiler.

Is there a way to implement the same syntactic sugar array(a, b, c, d) = val in DPC++ / SYCL 2020? I noticed that memory access in SYCL is provided by two abstractions called buffers and accessors. Unfortunately, they only support 1D, 2D, or 3D arrays, not higher dimensions. What is the best way to define a convenient wrapper for accessing high-dimension arrays?

Solution

As you say, captured objects in SYCL are not mutable, and for good reason: It is very unclear whether all work items should access a shared object of kernel arguments, or whether each work item should have its own copy -- ultimately this depends strongly on the backend / hardware and what they want to do. So we decided that all SYCL kernel arguments should be immutable.

You have two options (and you have already found one):

Just copy the kernel argument:

 Q.single_task([=]() {
   ArrayWrapper<int> a_kernel = array;
   a_kernel(1, 2, 3, 4) = 42;
 });

You already found this one: If this is not acceptable, think about the constness-model of your wrapper. What the immutability property of SYCL kernel arguments really cares about is whether data within the kernel arguments changes. In your case, your wrapper only provides a view -- the wrapper object itself does not change. As such, it might be acceptable to have the const overload return a non-const reference which would solve your issue and might be more appropriate for your use case. Note that true const views could still be represented by instantiating your wrapper with const T type. You could even implement conversions from ArrayWrapper<T> to ArrayWrapper<const T> if you like. As you say, this is how sycl::accessor objects are implemented.

I want to point out that you might not have to implement your own high-dimensional array wrapper. You should be able to use mdspan which already provides this functionality, and initialize it with a SYCL USM pointer. I have no idea about DPC++, but I know that this work in hipSYCL / Open SYCL.