I have a 3d array containing the non-cubic bounding box of a sparse geometry.
The array geometry[x][y][z] contains the value 0 if (x,y,z) is part of the computational domain and otherwise 1.
In an attempt to reorder computations I would like to traverse this space using a Hilbert Curve.
The context is optimizing global memory access in a memory-bound GPU program.
How can I implement this?
Update: I just want to traverse the non-empty cells, as I will only store those (in an array) together with an adjacency list which keeps track of an element's 19 neighboring nodes.
The computation is simply copying between two arrays:
dst[i] = src[adjacency_map[i]]
This is the propagation phase of a sparse Lattice Boltzmann method, where the physical interpretation is streaming 'fluid particles' from a neighboring site.
The more sequential the values in adjacency_map are; the more coalesced memory accesses we hopefully get.
OpenCL kernel:
__kernel void propagation(__global double *dst, __global double *source,
__global const int *adjacency_map, const uint max_size)
{
size_t l = get_global_id(0);
if( l > max_size )
return;
dst[l] = src[adjacency_map[l]];
}
A Hilbert curve would be a tall order. It seems difficult to find a formulation that would allow random access to the indices of the points on the curve.
A Morton ordering, however, would be reasonable, and has some of the same nice properties as it is also a space-filling curve. There is also a random access procedure for finding the Morton number of an N-dimensional point.
What you might consider is a two step process:
Apply a step of stream compaction to your data to select the volume elements that you wish to process
Sort this compacted data, using their Morton indices as the sorting key.
You could use thrust for both the stream compaction and the key-value sort.
This should produce a list of volume elements in a ordering which promotes contiguity. That said, the overhead of reorganizing the data may dominate the cost of the original irregular access pattern.