algorithmdata-structureskdtreerange-query

How to implement range search in KD-Tree


I have built a d dimensional KD-Tree. I want to do range search on this tree. Wikipedia mentions range search in KD-Trees, but doesn't talk about implementation/algorithm in any way. Can someone please help me with this? If not for any arbitrary d, any help for at least for d = 2 and d = 3 would be great. Thanks!


Solution

  • There are multiple variants of kd-tree. The one I used had the following specs:

    1. Each internal node has max two nodes.
    2. Each leaf node can have max maxCapacity points.
    3. No internal node stores any points.

    Side note: there are also versions where each node (irrespective of whether its internal or leaf) stores exactly one point. The algorithm below can be tweaked for those too. Its mainly the buildTree where the key difference lies.

    I wrote an algorithm for this some 2 years back, thanks to the resource pointed to by @9mat .

    Suppose the task is to find the number of points which lie in a given hyper-rectangle ("d" dimensions). This task can also be to list all points OR all points which lie in given range and satisfy some other criteria etc, but that can be a straightforward change to my code.

    Define a base node class as:

    template <typename T> class kdNode{
        public: kdNode(){}
        virtual long rangeQuery(const T* q_min, const T* q_max) const{ return 0; }
    };
    

    Then, an internal node (non-leaf node) can look like this:

    class internalNode:public kdNode<T>{
        const kdNode<T> *left = nullptr, *right = nullptr; // left and right sub trees
        int axis; // the axis on which split of points is being done
        T value; // the value based on which points are being split
    
        public: internalNode(){}
    
        void buildTree(...){
            // builds the tree recursively
        }
    
        // returns the number of points in this sub tree that lie inside the hyper rectangle formed by q_min and q_max
        int rangeQuery(const T* q_min, const T* q_max) const{
            // num of points that satisfy range query conditions
            int rangeCount = 0;
    
            // check for left node
            if(q_min[axis] <= value) {
                rangeCount += left->rangeQuery(q_min, q_max);
            }
            // check for right node
            if(q_max[axis] >= value) {
                rangeCount += right->rangeQuery(q_min, q_max);
            }
    
            return rangeCount;
        }
    };
    

    Finally, the leaf node would look like:

    class leaf:public kdNode<T>{
        // maxCapacity is a hyper - param, the max num of points you allow a node to hold
        array<T, d> points[maxCapacity];
        int keyCount = 0; // this is the actual num of points in this leaf (keyCount <= maxCapacity)
    
        public: leaf(){}
    
        public: void addPoint(const T* p){
            // add a point p to the leaf node
        }
    
        // check if points[index] lies inside the hyper rectangle formed by q_min and q_max
        inline bool containsPoint(const int index, const T* q_min, const T* q_max) const{
            for (int i=0; i<d; i++) {
                if (points[index][i] > q_max[i] || points[index][i] < q_min[i]) {
                    return false;
                }
            }
            return true;
        }
    
        // returns number of points in this leaf node that lie inside the hyper rectangle formed by q_min and q_max
        int rangeQuery(const T* q_min, const T* q_max) const{
            // num of points that satisfy range query conditions
            int rangeCount = 0;
            for(int i=0; i < this->keyCount; i++) {
                if(containsPoint(i, q_min, q_max)) {
                    rangeCount++;
                }
            }
            return rangeCount;
        }
    };
    

    In the code for range query inside the leaf node, it is also possible to do a "binary search" inside of "linear search". Since the points will be sorted along on the axis axis, you can do a binary search do find l and r values using q_min and q_max, and then do a linear search from l to r instead of 0 to keyCount-1 (of course in the worst case it wont help, but practically, and especially if you have a capacity of pretty high values, this may help).