I'm reading a paper titled "Practical Algorithms for Incremental Software Development Environments", by Tim A. Wagner. In chapter three, he talks about a tree data structure that represents the versions of a document. This tree is called the Global Version Tree (GVT). In that discussion I find this quote and figure, which I don't understand.
Specifically, I don't see how pre-order or post-order lists would help ancestor queries.
Both pre- and post-order sorted lists of the GVT nodes are maintained to support efficient 'ancestor-of' queries.
The text under the figure again mentions these sorted lists (I added italic).
Figure 3.4: Global version tree. The version hierarchy is defined by the ancestor relationships between the nodes. The figure on the left illustrates the conceptual relationships among the versions. The figure on the right shows the actual implementation, which differs in several ways: the addition of a sentinel version, ordering of the children, compression of linear runs, and the three sorted lists (pre-order, post-order, and creation-order) to enable efficient name lookup and ancestor-of tests.
What do they mean by "efficient ancestor-of queries", and how are they made faster by the lists? It seems to me that given two nodes in a tree, you can already decide if they are anscestors of one another by walking from each node to the root, checking for the other node. That's O(log N) time. Are the pre-order and post-order lists used to beat O(log N)?
What do they mean by "efficient ancestor-of queries"
They mean that such a query would provide the boolean outcome in constant time.
how are they made faster by the lists?
The efficiency claim is only true when the lists are enhanced with node-to-index information. So for the example you would have this data structure (I provide it in JSON format):
{
"preorder": { "A": 0, "B": 1, "D": 2, "C": 3, "E": 4 },
"postorder": { "D": 0, "E": 1, "C": 2, "B": 3, "A": 4 }
}
If now you have an ancestor-of query, given two nodes 𝑋 and 𝑌, then you can return the value of this boolean expression:
preorder[X] < preorder[Y] and postorder[X] > postorder[Y]
The lookup in preorder
and postorder
should have O(1) time. We can imagine that the nodes have unique sequential numbers (0, 1, 2, 3, ...), and then an array implementation would do. If nodes are identified in some other way, a dictionary/hashmap with an average O(1) lookup time would also work.
To see that the above expression is correct, let's consider the possible scenarios:
𝑋 is an ancestor of 𝑌. By definition of preorder and postorder, the expression will be true.
𝑋 is not an ancestor of 𝑌. We can split into these cases:
𝑌 is an ancestor of 𝑋, in which case both sub expressions are false -- it is the inverse of the above case.
𝑋 and 𝑌 are the same node: in that case both sub expressions are false.
𝑋 and 𝑌 have a lowest common ancestor 𝑍 that is distinct from 𝑋 and 𝑌, in which case one of the two is in 𝑍's left subtree, and the other in 𝑍's right subtree. Both a pre order and post order traversal will visit the nodes in the left subtree before the nodes in the right subtree. And so exactly one of the sub expressions will be false.