I have a huge set of N-dimensional points (tens of millions; N is close to 100).
I need to map these points to a single dimension while preserving spatial locality. I want to use Hilbert space-filling curve to do it.
For each point I want to pick the closest point on the curve. The Hilbert value of the point (curve length from the start of curve to the picked point) is the single dimension value I seek.
Computation does not have to be instant, but I expect it to be no more than several hours on decent modern home PC hardware.
Any suggestions on implementation? Are there any libraries that would help me? (Language does not matter much.)
I finally broke down and shelled out some money. The AIP (American Institute of Physics) has a nice, short article with source code in C. "Programming the Hilbert curve" by John Skilling (from the AIP Conf. Proc. 707, 381 (2004)) has an appendix with code for mappings in both directions. It works for any number of dimensions > 1, is not recursive, does not use state-transition lookup tables that gobble up huge amounts of memory, and mostly uses bit operations. Thus it is reasonably fast and has a good memory footprint.
If you choose to purchase the article, I discovered an error in the source code.
The following line of code (found in the function TransposetoAxes) has the error:
for( i = n-1; i >= 0; i-- ) X[i] ^= X[i-1];
The correction is to change the greater than or equal (>=) to a greater than (>). Without this correction, the X array is accessed using a negative index when the variable āiā becomes zero, causing the program to fail.
I recommend reading the article (which is seven pages long, including the code), as it explains how the algorithm works, which is far from obvious.
I translated his code into C# for my own use. The code follows. Skilling performs the transformation in place, overwriting the vector that you pass in. I chose to make a clone of the input vector and return a new copy. Also, I implemented the methods as extension methods.
Skilling's code represents the Hilbert index as a transpose, stored as an array. I find it more convenient to interleave the bits and form a single BigInteger (more useful in Dictionaries, easier to iterate over in loops, etc), but I optimized that operation and its inverse with magic numbers, bit operations and the like, and the code is lengthy, so I have omitted it.
namespace HilbertExtensions
{
/// <summary>
/// Convert between Hilbert index and N-dimensional points.
///
/// The Hilbert index is expressed as an array of transposed bits.
///
/// Example: 5 bits for each of n=3 coordinates.
/// 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored
/// as its Transpose ^
/// X[0] = A D G J M X[2]| 7
/// X[1] = B E H K N <-------> | /X[1]
/// X[2] = C F I L O axes |/
/// high low 0------> X[0]
///
/// NOTE: This algorithm is derived from work done by John Skilling and published in "Programming the Hilbert curve".
/// (c) 2004 American Institute of Physics.
///
/// </summary>
public static class HilbertCurveTransform
{
/// <summary>
/// Convert the Hilbert index into an N-dimensional point expressed as a vector of uints.
///
/// Note: In Skilling's paper, this function is named TransposetoAxes.
/// </summary>
/// <param name="transposedIndex">The Hilbert index stored in transposed form.</param>
/// <param name="bits">Number of bits per coordinate.</param>
/// <returns>Coordinate vector.</returns>
public static uint[] HilbertAxes(this uint[] transposedIndex, int bits)
{
var X = (uint[])transposedIndex.Clone();
int n = X.Length; // n: Number of dimensions
uint N = 2U << (bits - 1), P, Q, t;
int i;
// Gray decode by H ^ (H/2)
t = X[n - 1] >> 1;
// Corrected error in Skilling's paper on the following line. The appendix had i >= 0 leading to negative array index.
for (i = n - 1; i > 0; i--)
X[i] ^= X[i - 1];
X[0] ^= t;
// Undo excess work
for (Q = 2; Q != N; Q <<= 1)
{
P = Q - 1;
for (i = n - 1; i >= 0; i--)
if ((X[i] & Q) != 0U)
X[0] ^= P; // invert
else
{
t = (X[0] ^ X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
} // exchange
return X;
}
/// <summary>
/// Given the axes (coordinates) of a point in N-Dimensional space, find the distance to that point along the Hilbert curve.
/// That distance will be transposed; broken into pieces and distributed into an array.
///
/// The number of dimensions is the length of the hilbertAxes array.
///
/// Note: In Skilling's paper, this function is called AxestoTranspose.
/// </summary>
/// <param name="hilbertAxes">Point in N-space.</param>
/// <param name="bits">Depth of the Hilbert curve. If bits is one, this is the top-level Hilbert curve.</param>
/// <returns>The Hilbert distance (or index) as a transposed Hilbert index.</returns>
public static uint[] HilbertIndexTransposed(this uint[] hilbertAxes, int bits)
{
var X = (uint[])hilbertAxes.Clone();
var n = hilbertAxes.Length; // n: Number of dimensions
uint M = 1U << (bits - 1), P, Q, t;
int i;
// Inverse undo
for (Q = M; Q > 1; Q >>= 1)
{
P = Q - 1;
for (i = 0; i < n; i++)
if ((X[i] & Q) != 0)
X[0] ^= P; // invert
else
{
t = (X[0] ^ X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
} // exchange
// Gray encode
for (i = 1; i < n; i++)
X[i] ^= X[i - 1];
t = 0;
for (Q = M; Q > 1; Q >>= 1)
if ((X[n - 1] & Q)!=0)
t ^= Q - 1;
for (i = 0; i < n; i++)
X[i] ^= t;
return X;
}
}
}
I have posted working code in C# to github.
See https://github.com/paulchernoch/HilbertTransformation
UPDATE: I just published (Fall 2019) a Rust crate on crates.io called "hilbert". It also uses Skilling's algorithm. See https://crates.io/crates/hilbert