algorithm data-structures binary-search-tree range-tree

Range Trees: why not save space by default?

Suppose you have a set S of unique points on the 2-dimensional plane. Now, you are expecting a bunch of questions in the form of "is point p present in S?" You decide to build a Range Tree to store your S and answer this question. The basic idea behind a Range Tree is that you first build a balanced binary search tree Tree0 on the 0-th coordinate and then at each node of this Tree0 build yet another balanced search tree Tree1 but this time using 1-st coordinate as your key. Wikipedia article for Range Tree.

Now, I was expecting that Tree1 which is built for every node n0 of Tree0 would hold exactly those points whose 0-th coordinate is equal to the key from n0. However, if you read more about Range Trees, you will see that this is not the case. Specifically:

The root r0 of Tree0 contains a Tree1 which holds all points.
The left child of r0 contains a Tree1 which holds all of the points whose 0-th coordinate is less than the 0-th coordinate at r0.
The right child of r0 contains a Tree1 which holds all of the points whose 0-th coordinate is greater than that from r0.

If you continue this logic, you will see that at each level of the Range Tree all points are stored exactly once. So, each level requires n memory and since the depth of a balanced Tree0 is logn, this gives O(nlogn) as memory requirement.

However, if you would just store the points whose 0-th coordinate exactly matches the key at the node, you would be storing each point once per the entire tree (instead of per level of the tree), which gives O(n) memory requirement.

What is the reason behind storing the points once per level in the Range Tree? Does it allow for some kind of cool range queries or something? So far it looks to me like any query that you could perform on the O(nlogn) version is also available for the O(n) version. What am I missing?

Solution

(Expanding @user3386109’s comment into a full answer.)

There are several different data structures for storing 2D collections of points, each of which is optimized for different types of queries. As the name suggests, range trees are optimized for range searches, queries of the form “here’s a rectangle, what are all the points in that rectangle?” The structure of the range tree - storing each point in several different subtrees - is designed so that you can find a span of nodes in 1D containing one axis of the rectangle, then discover all the nodes in the next dimension that are in the other dimension of the rectangle. If you aren’t planning on making any queries of that form, then there’s no need to store things this way. You’re essentially paying for something you aren’t going to use.

There are other data structures you could use for storing a set of points and seeing whether a particular point is present. If that’s the only question you need to answer, a simple hash table might be all you’d need to use. You could also use a regular BST where points are compared first by their first components, then by their second components. (You could also use a k-d tree here if you’d like.)

Hope this helps!