I'm somewhat new to opencl and am trying to learn to use boost::compute properly. Consider the following code:
#include <iostream>
#include <vector>
#include <boost/compute.hpp>
const cl_int cell_U_size{ 4 };
#pragma pack (push,1)
struct Cell
{
cl_double U[cell_U_size];
};
#pragma pack (pop)
BOOST_COMPUTE_ADAPT_STRUCT(Cell, Cell, (U));
int main(int argc, char* argv[])
{
using namespace boost;
auto device = compute::system::default_device();
auto context = compute::context(device);
auto queue = compute::command_queue(context, device);
std::vector<Cell> host_Cells;
host_Cells.reserve(10);
for (auto j = 0; j < host_Cells.capacity(); ++j) {
host_Cells.emplace_back(Cell());
for (auto i = 0; i < cell_U_size; ++i) {
host_Cells.back().U[i] = static_cast<cl_double>(i+j);
}
}
std::cout << "Before:\n";
for (auto const& hc : host_Cells) {
for (auto const& u : hc.U)
std::cout << " " << u;
std::cout << "\n";
}
compute::vector<Cell> device_Cells(host_Cells.size(), context);
auto f = compute::copy_async(host_Cells.begin(), host_Cells.end(), device_Cells.begin(), queue);
try {
BOOST_COMPUTE_CLOSURE(Cell, Step1, (Cell cell), (cell_U_size), {
for (int i = 0; i < cell_U_size; ++i) {
cell.U[i] += 1.0;
}
return cell;
});
f.wait(); // Wait for data to finish being copied
compute::transform(device_Cells.begin(), device_Cells.end(), device_Cells.begin(), Step1, queue);
//BOOST_COMPUTE_CLOSURE(void, Step2, (Cell &cell), (cell_U_size), {
// for (int i = 0; i < cell_U_size; ++i) {
// cell.U[i] += 1.0;
// }
//});
//compute::for_each(device_Cells.begin(), device_Cells.end(), Step2, queue);
compute::copy(device_Cells.begin(), device_Cells.end(), host_Cells.begin(), queue);
}
catch (std::exception &e) {
std::cout << e.what() << std::endl;
throw;
}
std::cout << "After:\n";
for (auto const& hc : host_Cells) {
for (auto const& u : hc.U)
std::cout << " " << u;
std::cout << "\n";
}
}
I have a vector of custom structs (actually much more complicated than shown here) that I want to process on the GPU. In the uncommented BOOST_COMPUTE_CLOSURE the compute::transform
passes the structs by value, processes them and then copies them back.
I would like to pass these by reference as shown in the commented out BOOST_COMPUTE_CLOSURE with compute::for_each
, but the kernel fails to compile (Build Program Failure
) when the program is run and I have not found any documentation mentioning how this should be achieved.
I know I can achieve passing by reference (pointers actually, since it's C99) by using BOOST_COMPUTE_STRINGIZE_SOURCE
and passing a pointer to the entire vector of structs, but I'd like to use the compute::...
functions as these seem more elegant.
If you define BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION
macro and building OpenCL program fails, the program source and the build log will be written to stdout.
You can't pass by reference in OpenCL C, which you are trying to do in the BOOST_COMPUTE_CLOSURE
. I understand that you would like to pass a __global
pointer to your closure and modify values of the variable in global memory, not of the local copy of that value. I don't think it's supported in Boost.Compute, because in for_each
(and other algorithms) Boost.Compute always passes value to your function/closure.
Of course you can always implement a workaround - add unary &
operator, or implement custom device iterator. However, in presented example it would just decrease performance, because it would lead to non-coalesced memory reads and writes. If you have very array of complex structures (AoS), try to change it structure of arrays (SoA) or/and break your structure.