While experimenting with boost::compute I've run into an issue with determining the largest vector I can allocate on a device (I'm still fairly new to boost::compute). The following snippet of code
std::vector<cl_double> host_tmp;
std::cout << "CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_GLOBAL_MEM_SIZE) / sizeof(cl_double) << "\n";
std::cout << "CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = " << device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double) << "\n";
size_t num_elements = device.get_info<cl_ulong>(CL_DEVICE_MAX_MEM_ALLOC_SIZE) / sizeof(cl_double);
compute::vector<cl_double> dev_tmp(context);
std::cout << "Maximum size of vector reported by .max_size() = " << dev_tmp.max_size() << "\n";
for (auto i = 0; i < 64; ++i) {
std::cout << "Resizing device vector to " << num_elements << "...";
dev_tmp.resize(num_elements, queue);
std::cout << " done.";
std::cout << " Assigning host data...";
host_tmp.resize(num_elements);
std::iota(host_tmp.begin(), host_tmp.end(), 0);
std::cout << " done.";
std::cout << " Copying data from host to device...";
compute::copy(host_tmp.begin(), host_tmp.end(), dev_tmp.begin(), queue);
std::cout << " done.\n";
num_elements += 1024 * 1024;
}
gives
CL_DEVICE_GLOBAL_MEM_SIZE / sizeof(cl_double) = 268435456
CL_DEVICE_MAX_MEM_ALLOC_SIZE / sizeof(cl_double) = 67108864
Maximum size of vector reported by .max_size() = 67108864
Resizing device vector to 67108864... done. Assigning host data... done. Copying data from host to device... done.
Resizing device vector to 68157440... done. Assigning host data... done. Copying data from host to device... done.
...
Resizing device vector to 101711872...Memory Object Allocation Failure
so clearly the reported max_size() is neither a hard limit nor enforced.
I assume that to be safe I should stick to the reported max_size(), however, if I allocate multiple vectors on the device of size max_size(), then I also receive the Memory Object Allocation Failure
message.
- What is the correct/usual way to deal with (and avoid) memory allocation failures when using boost::compute?
You just need to follow the same rules as for OpenCL. Boost.Compute does not add any new restrictions. You have to remember that on many OpenCL platforms allocation memory for buffer is done in a lazy way, so even if creating buffer of size greater than CL_DEVICE_MAX_MEM_ALLOC_SIZE
is successful it can fail later (implementation defined behaviour).
- How can I determine the largest size of a vector that I can allocate at any given moment (i.e. the device may already contain allocated data)?
I don't think that possible. You can always create your allocator class (and use it with boost::compute::vector
), that would globally track this per device (using CL_DEVICE_GLOBAL_MEM_SIZE
) and do whatever you want it to do when there's not enough memory. However, you have to remember that OpenCL memory is bound to a context and not to a device.
- If I have too much data, can I get boost::compute to automatically process it in chunks or do I have to break it up myself?
No, you have to implement something that takes care of that. It can be done in multiple ways depending on your OpenCL platform and supported OpenCL version.
- How do I free up memory on the device once I'm done with it?
boost::compute::vector
's destructor release device memory. Each OpenCL memory object (like buffer) has its reference counter that is properly increased and decreased by Boost.Compute's classes. Note: Iterators do not own buffers, so after underlying buffer is released (for example, after boost::compute::vector
that allocated that buffer is destructed), iterators stop working.