algorithmsearchbinary-search-treecomputational-geometryrange-tree

Priority Search Tree confusion


The only reasonable slide set I found is this, which in page 15 says, for building:

  • Sort all points by their x coordinate value and store them in the leaf nodes of a balanced binary tree (i.e., a range tree)
  • Starting at the root, each node contains the point in its subtree with the maximum value for its y coordinate that has not been stored
    at a shallower depth in the tree; if no such node exists, then node
    is empty

I implemented a Range Tree just before, so based on the example I used there, I now have (for a Priority Search Tree):

                             7
                      ------ 0 ---------------------
                      |                            |
                      6                            6
                ---- -3 ----                  ---- 4 ------ 
                |          |                  |           |
                4          2                  1           3
          ---- -4 -       -2              --- 3 ---       5
          |       |       / \             |       |      / \
          0    (-3,4) (-2,2)(0,7)        NIL   (4,-4) (5,3)(6,-1)
         -5                               2    
         / \                             / \
    (-5,6) (-4,0)                    (2,1) (3,6)

Here, every inner node is of the form:

mid_x
max y

and the range query I am posing is (-inf, 1) x (-0.5, 5), i.e. (x_low, x_high) x (y_low, y_high).

  1. So, let's start from the root, I check if x (=0) is in (-inf, 1) and if 7 > (-0.5, 5). It is, thus I proceed in both children, but for simplicity, let me follow the left one (in all cases from now).
  2. I check if -3 is the x range and if 6 is more or equal than the upper bound of the y range of the query (i.e. 5). It is, so I continue.
  3. Same for the next level, thus we go to the next level and now please focus on this inner node. It has as max y a 0, which indicates that the max y of the subtree is 0, which is incorrect (left child is (-5, 6))*.

What am I missing in building/searching process?


In other words:

Check the leftmost branch of the tree; it has as max_y values (2nd bullet in the quote), 7, 6, 4, 0.

But isn't that value the one that indicated the maximum y value stored in the subtree under the inner node? If so, this does not hold for 0 and point (-5, 6) (6 is not used, since its used in the 2nd level).


*The particular query I am posing might not be damaged by that, but another one can.


Solution

  • You last logic is actually still correct. The (-5,6) value should've already been picked up when you hit the node you labelled (6,-3). Now, I'm no math major. But the general idea is this. In the Priority Search tree as you implemented, you're actually searching on two separate criteria. For x, it's a simple binary search for the range. For y, you're actually searching for it as a priority tree (good for search of y:inf or -inf:y, depending on your whether you use max or min.)

    Note that at the bottom of page 15, it states that the tree is good for a semi-infinite range (one is infinite). Keep reading down, you'll see how the tree is optimized for semi-infinite range for y.

    In short, since your tree is constructed with x as the binary search and y as a priority (using max remaining value), the optimal search is [x1:x2],[y1:inf].

    Each node in the tree essentially stores 2 things. 1. The mid-point of x (or the max-x in the left tree, and the decision to traverse left or right is based on the >x check). 2. The POINT with the highest y value in the subtree that had not been added to previous one.

    The search algorithm essentially goes like this. Starting from the root using a criteria of [x1:x2], [y1:inf] 1. Check the y-value of the POINT assigned to the node. If y > y1, go to 2, otherwise STOP traversing down (since the POINT assigned to the node has the highest y value, if the check failed, there's no other node beneath it that can fulfill [y1:inf]. 2. Check if the x-value of the point is in range of [x1:x2]. If so, include that point in the output. Continue to step 3 regardless if you included that point. 3. Check the node's "mid-point" value (let's call that mx). If mx is in range of [x1:x2], traverse down both path. If mx is < [x1:x2], traverse left. Otherwise traverse right. Go back to step 1 for each path you traverse.

    EDIT for very, VERY long text: Let's run through your example. I've made an additional annotation marking each point using letter (the point's "name"). Each node now have the format of name of the point, with it's y-value in the parenthsis, and the "mid-range" x below it. For the leaf nodes, those with an '*' means they are already assigned to somewhere up the tree, and should be treated as NIL if you ever reach them.

                                  7(E)
                           ------ 0 ---------------------
                           |                            |
                          A(6)                         G(6)
                     ----- -3 -----                  ---- 4 -------- 
                     |            |                  |             |
                     C(4)        D(2)               F(1)          I(3)
               ---- -4 -         -2              --- 3 ---         5
               |       |         / \             |       |        / \
               B(0)  C*(-3,4)D*(-2,2)E*(0,7)     NIL   H(4,-4) I*(5,3)J(6,-1)
              -5                                 2   
              / \                               / \
        A*(-5,6)B*(-4,0)                   F*(2,1) G*(3,6)
    

    Let's run an example query of [-2:4] [1:inf](or any y >= 1)

    The result is "E,F,G" are in range.