sycl

What is the contiguous dimension in a SYCL kernel? In a buffer? In an image?


What is the contiguous dimension in an N-dimensional SYCL kernel, i.e. the dimension in which threads of a work-group are expected to belong to the same warp/wavefront? I would have expected it to be dimension 0, but some resource out there suggest otherwise (https://enccs.github.io/sycl-workshop/expressing-parallelism-nd-range/). I can't find any answer in the SYCL specification.

The contiguous dimension in a buffer seems to be the last one. The following code stores contiguous pixels along the last dimension (at least using Intel DPC++).

sycl::buffer<float, 2> buffer1(image.data(), range);

That may seems unrelated. But I would have expected something like this to be a common pattern.

queue.submit([&](sycl::handler &handler) {
    auto access = buffer.get_access<sycl::access_mode::read>(handler);
    handler.parallel_for(buffer.get_range(), [=](sycl::id<2> id) {
        auto v = access[id[0], id[1]];
        ...
    });
});

However, the contiguous dimension of an image seems to be first one (dimension 0). The following code stores contiguous pixels along dimension 0.

sycl::image<2> image1(vector1.data(), sycl::image_channel_order::r, sycl::image_channel_type::fp32, image_range);

Which creates some confusion.

Thanks.


Solution

  • The contiguous dimension (also known as fast index, vectorized index etc) for buffers is indeed always the "last" one, i.e. index 2 for a 3D object.

    This is specified in section 3.11.1 "Linearization" of the SYCL 2020 specification: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:multi-dim-linearization

    That particular choice is motivated by C++ which behaves in the same way for its arrays, and SYCL aims to align with C++.

    Note that your sample code can also be written in a more idiomatic version that hides this detail:

    queue.submit([&](sycl::handler &handler) {
        auto access = buffer.get_access<sycl::access_mode::read>(handler);
        handler.parallel_for(buffer.get_range(), [=](sycl::id<2> id) {
            auto v = access[id];
            ...
        });
    });
    

    For images, I am not sure it is that well-defined since images are a backend-specific opaque object. I agree that the specification may be unclear here as to what this means for host pointers that are used as input to the image class. Given that the linearization equation from 3.11.1 is not explicitly restricted to buffers, it could be argued that the behavior you are seeing is a bug in your SYCL implementation.