While optimizing a ray tracer I was trying to improve data locality for the intersection datastructure using a Morton space filling curve, except that my 3D space is not cubic (eg. 512x512x256). All sides are a power of two, but not all sides are the same length.
I have been unable to find any examples for non square Morton curves where the sides are a power of two. If it matters I can guarantee that the x/y axis are the same size with only the z axis being a different length.
Note how the width is 2x height, but it could also be 3x or 4x or any other. I have been unable to find a way how to do this.
Ideally the solution would be fast as the Morton code has to be calculated a lot. So my question is: How do I generate a space filling morton curve for non-cubic spaces? This is specifically for the GPU (Cuda).
The conditions on the dimensions are:
x, y, z are a power of two
x == y
x, y >= z
Or if easier
x, y > z
It would probably break for, say, nz=11, but for half of the size of the XY square it seems to work for me
#include <cstdint>
#include <iostream>
static inline uint32_t spread(uint32_t x)
{
x = (x | (x << 10)) & 0x000F801F;
x = (x | (x << 4)) & 0x00E181C3;
x = (x | (x << 2)) & 0x03248649;
x = (x | (x << 2)) & 0x09249249;
return x;
}
static inline uint32_t morton(const uint32_t x, const uint32_t y, const uint32_t z)
{
return spread(x) << 0 | spread(y) << 1 | spread(z) << 2;
}
auto main() -> int {
int nx = 32;
int ny = 32;
int nz = 16;
for (int iz = 0; iz != nz; ++iz)
{
for (int iy = 0; iy != ny; ++iy)
{
for (int ix = 0; ix != nx; ++ix)
{
auto m = morton(ix, iy, iz);
std::cout << m << '\n';
}
}
}
return 0;
}
UPDATE
How to make Morton code work for, say, 256x256x64 (8bit*8bit*6bit): you have to spread X and Y non-equidistantly, taking into account number of bits in Z. Basically, for cube you spread evenly: each bit at position 0, 3, 6, 9, 12, 15, 18, 21, 24, leaving space for other two bits from orthogonal axes.
So there is equidistant spread for a cube. But for the case when you have only 6 bits from Z to insert, you have to have 6 distances of 3, but no Z bits for last gap, thus last gap for X and Y spread should be only 1 bit wide. Thus, non-equidistant spread in X and Y.
Something along the line: if Nx=Ny are number of bits in XY plane, and Nz!=Nx or Ny is number of bits along Z axis, spread gap should be 2 bits for Nz bits and gap of 1 bit for what is left. So two spread routines - one for X&Y with non-equidistant spread which now depends on Nz, and existing spread function for Z axis.
Ok, here is a working version, seems to be doing right thing
#include <cstdint>
#include <iostream>
#define func auto
func spreadZ(uint32_t v) -> uint32_t { // 2bit gap spread
v = (v | (v << 10)) & 0x000F801F;
v = (v | (v << 4)) & 0x00E181C3;
v = (v | (v << 2)) & 0x03248649;
v = (v | (v << 2)) & 0x09249249;
return v;
}
func spreadXY(const uint32_t v, const uint32_t bitsZ) -> uint32_t {
uint32_t mask_z = (1U << bitsZ) - 1U; // to mask bits which are going to have 2bit gap
uint32_t lo{ v & mask_z }; // lower part of the value where there are Z bits
lo = spreadZ(lo); // 2bit gap spread
uint32_t hi = v >> bitsZ; // high part of the value, 1bit gap
// 1bit gap spread
hi = (hi ^ (hi << 8)) & 0x00ff00ffU;
hi = (hi ^ (hi << 4)) & 0x0f0f0f0fU;
hi = (hi ^ (hi << 2)) & 0x33333333U;
hi = (hi ^ (hi << 1)) & 0x55555555U;
return lo + (hi << 3*bitsZ); // combine them all together
}
func morton(const uint32_t x, const uint32_t y, const uint32_t z, const uint32_t bitsZ) -> uint32_t {
return spreadXY(x, bitsZ) << 0 | spreadXY(y, bitsZ) << 1 | spreadZ(z) << 2;
}
func ispowerof2(const uint32_t n) -> bool {
return n && (!(n & (n - 1u)));
}
func bit_pos(const uint32_t n) -> uint32_t {
if (!ispowerof2(n))
throw -1;
uint32_t i{ 1u }, pos{ 1u };
while (!(i & n)) { // Iterate through bits of n till we find a set bit, i&n will be non-zero only when 'i' and 'n' have a same bit
i = i << 1; // unset current bit and set the next bit in 'i'
++pos; // increment position
}
return pos;
}
func main() -> int {
int nx = 256;
int ny = 256;
int nz = 256; //256...128...64...32...16...8...4...2...1 all works
int bitsZ = bit_pos(nz) - 1; // should be doing try/catch
for (int iz = 0; iz != nz; ++iz)
{
for (int iy = 0; iy != ny; ++iy)
{
for (int ix = 0; ix != nx; ++ix)
{
auto m = morton(ix, iy, iz, bitsZ);
std::cout << m << '\n';
}
}
}
return 0;
}