The only reasonable slide set I found is this, which in page 15 says, for building:
- Sort all points by their x coordinate value and store them in the leaf nodes of a balanced binary tree (i.e., a range tree)
- Starting at the root, each node contains the point in its subtree with the maximum value for its y coordinate that has not been stored
at a shallower depth in the tree; if no such node exists, then node
is empty
I implemented a Range Tree just before, so based on the example I used there, I now have (for a Priority Search Tree):
7
------ 0 ---------------------
| |
6 6
---- -3 ---- ---- 4 ------
| | | |
4 2 1 3
---- -4 - -2 --- 3 --- 5
| | / \ | | / \
0 (-3,4) (-2,2)(0,7) NIL (4,-4) (5,3)(6,-1)
-5 2
/ \ / \
(-5,6) (-4,0) (2,1) (3,6)
Here, every inner node is of the form:
mid_x
max y
and the range query I am posing is (-inf, 1) x (-0.5, 5), i.e. (x_low, x_high) x (y_low, y_high).
What am I missing in building/searching process?
In other words:
Check the leftmost branch of the tree; it has as max_y
values (2nd bullet in the quote), 7, 6, 4, 0.
But isn't that value the one that indicated the maximum y value stored in the subtree under the inner node? If so, this does not hold for 0
and point (-5, 6)
(6 is not used, since its used in the 2nd level).
*The particular query I am posing might not be damaged by that, but another one can.
You last logic is actually still correct. The (-5,6) value should've already been picked up when you hit the node you labelled (6,-3). Now, I'm no math major. But the general idea is this. In the Priority Search tree as you implemented, you're actually searching on two separate criteria. For x, it's a simple binary search for the range. For y, you're actually searching for it as a priority tree (good for search of y:inf or -inf:y, depending on your whether you use max or min.)
Note that at the bottom of page 15, it states that the tree is good for a semi-infinite range (one is infinite). Keep reading down, you'll see how the tree is optimized for semi-infinite range for y.
In short, since your tree is constructed with x as the binary search and y as a priority (using max remaining value), the optimal search is [x1:x2],[y1:inf].
Each node in the tree essentially stores 2 things. 1. The mid-point of x (or the max-x in the left tree, and the decision to traverse left or right is based on the >x check). 2. The POINT with the highest y value in the subtree that had not been added to previous one.
The search algorithm essentially goes like this. Starting from the root using a criteria of [x1:x2], [y1:inf] 1. Check the y-value of the POINT assigned to the node. If y > y1, go to 2, otherwise STOP traversing down (since the POINT assigned to the node has the highest y value, if the check failed, there's no other node beneath it that can fulfill [y1:inf]. 2. Check if the x-value of the point is in range of [x1:x2]. If so, include that point in the output. Continue to step 3 regardless if you included that point. 3. Check the node's "mid-point" value (let's call that mx). If mx is in range of [x1:x2], traverse down both path. If mx is < [x1:x2], traverse left. Otherwise traverse right. Go back to step 1 for each path you traverse.
EDIT for very, VERY long text: Let's run through your example. I've made an additional annotation marking each point using letter (the point's "name"). Each node now have the format of name of the point, with it's y-value in the parenthsis, and the "mid-range" x below it. For the leaf nodes, those with an '*' means they are already assigned to somewhere up the tree, and should be treated as NIL if you ever reach them.
7(E)
------ 0 ---------------------
| |
A(6) G(6)
----- -3 ----- ---- 4 --------
| | | |
C(4) D(2) F(1) I(3)
---- -4 - -2 --- 3 --- 5
| | / \ | | / \
B(0) C*(-3,4)D*(-2,2)E*(0,7) NIL H(4,-4) I*(5,3)J(6,-1)
-5 2
/ \ / \
A*(-5,6)B*(-4,0) F*(2,1) G*(3,6)
Let's run an example query of [-2:4] [1:inf](or any y >= 1)
The result is "E,F,G" are in range.