algorithmsearchrangebit-manipulationz-order-curve

How to use Morton Order in range search?


If I have a dataset, where the keys are points in 3D, represented by 3 signed 64 bit integers. And I want to use a (sorted)key-value store to store them, where keys are just byte array (but I can specify a comparator). I think I can turn all those points into a byte array by using bit-interleaving, as is done with Z/Morton order like in How to compute a 3D Morton number

In addition to fetching individual points, which can be done more simply without Morton ordering, I want to do range search, where I search within a box that is aligned with the axes. I will define A and B as being respectively the box corners where all coordinates are the lowest, and the opposite corner where all coordinates are the highest.

Now my questions are:

  1. For any point C, which is logically between, A and B, will the Morton number of C also be between the Morton number of A and B? (Isn't that the point of Morton order?)

  2. If 1 is no, can A and B be "rounded" to values that guaranty that C will be included?

  3. Assuming that 1 or 2 is possible, does the search return also points outside of that box, which I have to "post-filter" away? How big is that "error set" (does it depend on the size, or the position, of the search)?

  4. Does the fact that the integers are signed cause problems? And if so, is there a work-around?

To recap, using Morton Numbers is just one possible solution to the real problem: How to efficiently range-search in 3D integer space, when the 3D points have to be mapped to a unidimensional value? I want to get all the points between A and B, by performing a single range select in the DB, using a min-key and a max-key, and ideally, getting as few points outside the box as possible.


Solution

  • Seems like I have to answer my question myself. And the answer will relate to the z order curve, which is what I actually asked for. This is as far as I get it:

    1. Yes. It always work like that with z order curves.
    2. Is irrelevant, because 1) is true.
    3. It depends on how big the area is, and where it is, but the worst case scenario is when you actually include the centre of the space, rather then when you are far from it. For any dimensionality, if you use indexes of depth N bits in each dimension, there are special cases, where the z order curve give you an exact match, and you only get points in the search box. This happens when the following conditions are met:
    4. The search area is the same size in all dimension, and a power of two, 2^M, where 0 <= M <= N.
    5. The search area is a hypercube aligned with all axis.
    6. All coordinates of the search area hypercube corners are a multiple of 2^M. When all those conditions are met, the search area corresponds exactly to a sub-tree node, and therefore matches exactly. In the general case, though, the best you can do is find the smallest node that will hold all the desired points, and subdivide it recursively in smaller nodes that provide a partial match, to some maximum desired depth, to get a better match at the cost of using multiple queries. Finding the smallest tree node that contains all corners of the area is equivalent to finding the common prefix of the Morton code for all corners. And when using a single query, the amount of points outside the area returned is the volume of that node queried minus the volume of the search area.
    7. I'm pretty sure that's a problem, but I haven't found info on that yet.

    It seems what I'm saying isn't clear, so I'll do a little ASCII ART...

    A basic Z-order (Morton order) curve in 2D goes like this (path is A,B,C,D):

    x  0   1
    
    0  A → B
         ↙
    1  C → D
    

    So, A = (0,0) B = (0,1) C = (1,0) D = (1,1)

    Now a basic Hilbert curve in 2D goes like this (path is A,B,C,D):

    x  0   1
    
    0  A   D
       ↓   ↑
    1  B → C
    

    So, A = (0,0) B = (1,0) C = (1,1) D = (0,1)

    With a Z-order (Morton order), the curve will start from the lowest point A (0,0), and end with the highest point D (1,1), while covering all the points in between. This makes a range search easy, and it work in 3D too. All the points in a 3D box (0,0,0) to (1,1,1) are between the Morton order code of (0,0,0) and (1,1,1).

    With an Hilbert curve, you start at (0,0) and ends at (0,1) (Probably similar in 3D). Therefore, performing the range-query from the Hilbert code of (0,0) to (1,1) will NOT find all points.

    Therefore, if I am to use an Hilbert curve to perform my range query of a 3D box (0,0,0) to (1,1,1) (as a single DB query), I need a function that tells me what point should I use as first point, and what point I should use as last point, because (0,0,0) and (1,1,1) will not work.

    So, is there such a function, where you give your 8 (for 3D) box coordinates, and it returns the first and last points to use in your range query? Without that, I cannot use Hilbert curves to solve my problem.

    Or, in pseudo-SQL:

    Morton Query:

    SELECT key,data FROM table WHERE (key >= Morton(lowest(box))) AND (key <= Morton(highest(box)))
    

    Hilbert Query:

    SELECT key,data FROM table WHERE (key >= Hilbert(XXX(box))) AND (key <= Hilbert(YYY(box)))
    

    And so I need the XXX() and YYY().

    [EDIT] A part of this answer was targeted at some other answer telling me to use Hilbert curves, but which has since deleted their answer.