Suppose you have a set S
of unique points on the 2-dimensional plane. Now, you are expecting a bunch of questions in the form of "is point p
present in S
?" You decide to build a Range Tree to store your S
and answer this question. The basic idea behind a Range Tree is that you first build a balanced binary search tree Tree0
on the 0-th
coordinate and then at each node of this Tree0
build yet another balanced search tree Tree1
but this time using 1-st
coordinate as your key. Wikipedia article for Range Tree.
Now, I was expecting that Tree1
which is built for every node n0
of Tree0
would hold exactly those points whose 0-th
coordinate is equal to the key from n0
. However, if you read more about Range Trees, you will see that this is not the case. Specifically:
r0
of Tree0
contains a Tree1
which holds all points.r0
contains a Tree1
which holds all of the points whose 0-th
coordinate is less than the 0-th
coordinate at r0
.r0
contains a Tree1
which holds all of the points whose 0-th
coordinate is greater than that from r0
.If you continue this logic, you will see that at each level of the Range Tree all points are stored exactly once. So, each level requires n
memory and since the depth of a balanced Tree0
is logn
, this gives O(nlogn)
as memory requirement.
However, if you would just store the points whose 0-th
coordinate exactly matches the key at the node, you would be storing each point once per the entire tree (instead of per level of the tree), which gives O(n)
memory requirement.
What is the reason behind storing the points once per level in the Range Tree? Does it allow for some kind of cool range queries or something? So far it looks to me like any query that you could perform on the O(nlogn)
version is also available for the O(n)
version. What am I missing?
(Expanding @user3386109’s comment into a full answer.)
There are several different data structures for storing 2D collections of points, each of which is optimized for different types of queries. As the name suggests, range trees are optimized for range searches, queries of the form “here’s a rectangle, what are all the points in that rectangle?” The structure of the range tree - storing each point in several different subtrees - is designed so that you can find a span of nodes in 1D containing one axis of the rectangle, then discover all the nodes in the next dimension that are in the other dimension of the rectangle. If you aren’t planning on making any queries of that form, then there’s no need to store things this way. You’re essentially paying for something you aren’t going to use.
There are other data structures you could use for storing a set of points and seeing whether a particular point is present. If that’s the only question you need to answer, a simple hash table might be all you’d need to use. You could also use a regular BST where points are compared first by their first components, then by their second components. (You could also use a k-d tree here if you’d like.)
Hope this helps!