Thrust has the thrust::reduce_by_key
algorithm which works well for a problem of mine. I wanted to try to use CUB for finer control of memory and streams as well as interaction with my own kernels, but there doesn't seem to be a reduce_by_key
equivalent in CUB. There is cub::DeviceSegmentedReduce
but it doesn't seem to output a key order, and I would need to find the start of each consecutive segment of keys beforehand which is its own problem.
The problem is to reduce an array of key-value pairs into an array where all keys are unique, and each value has the reduction of all same-key values.
Is there an easy way to do this with CUB that I am not seeing?
Yes, the algorithm you are searching for is called cub::DeviceReduce::ReduceByKey
. For some reason* it is categorized as reduction instead of as segmented reduction which is certainly confusing.
*: Maybe cub::DeviceSegmentedReduce
is a more recent addition and they didn't want to break backward compatibility by moving ReduceByKey
out of DeviceReduce
.