sqlpostgresqltreematerialized-path-patterndjango-treebeard

Searching for the right-most node of a materialized path tree


Is it possible to sort by a materialized path tree's path text field in order to find the right-most node of the tree? For example, consider this python function that uses django-treebeard's MP_Node:

def get_rightmost_node():
    """Returns the rightmost node in the current tree.

    :rtype: MyNode
    """
    # MyNode is a subclass of django-treebeard's MP_Node.
    return MyNode.objects.order_by('-path').first()

From all my testing, it seems to return what I expect, but I don't know how to come up with the math to prove it. And I haven't found any info on performing this operation on a materialized path tree.

Treebeard's implementation doesn't have separators in paths, so the paths look like this: 0001, 00010001, 000100010012, etc.


Solution

  • Short answer: No.

    Here is a SQLFiddle demonstrating the problem I described in my comment.

    For this simple setup:

    id, path
    1,  '1'
    2,  '1\2'
    3,  '1\3'
    4,  '1\4'
    5,  '1\5'
    6,  '1\6'
    7,  '1\7'
    8,  '1\8'
    9,  '1\9'
    10, '1\10'
    

    attempting to get the rightmost leaf (id = 10) with a simple sort will fail:

    SELECT TOP 1
      id,
      path
    FROM hierarchy
    ORDER BY path DESC
    

    returns:

    id, path
    9,  1\9
    

    Because path is a text-based column, 1\10 will come after 1\9 in the descending sort (See the results of the second query in the fiddle).

    Even if you began tracking depth and path length, which are generally cheap and easy to keep up with, it would be entirely possible to get paths like this:

    path       depth  length
    12\3\11\2  4      9
    5\17\10\1  4      9
    

    which would still not sort properly.

    Even if you are using letters instead of numbers, this only pushes the problem horizon out to the 26th child instead of the 10th:

    SQLFiddle using letters

    I am not as familiar with materialized path operations as I am with nested set and adjacency lists, and have no experience with django, so I'll defer to others if there are methods of which I am unaware, but you will almost certainly have to perform some sort of parsing on the path column to consistently get the correct leaf.

    EDIT - Having addressed the question of whether sorting is a valid solution, here are some additional notes on other potential solutions after a bit of discussion and thinking on the problem a bit:

    -"Rightmost" is a vague term when nodes can have more than two children (i.e., the tree is not a binary tree). If a node has 10 children, which are to the left of the parent, and which are to the right? You must define this condition before you can define a solution to the problem.

    -Once "rightmost" is properly defined for your problem space, understand that the rightmost node will not necessarily be on the lowest level of the tree:

            1
           / \
        1\1   1\2 <= This is the rightmost node
        /
      1\1\1 <= This is the lowest node
    

    -Once "rightmost" is defined, a simple loop can be used to programatically find the rightmost node:

    //in pseudocode
    function GetRightmostNode(Node startNode)
    {
      Node currentNode = startNode;
    
      while(currentNode.RightChildren != null)
      {
        currentNode = maximum of currentNode.RightChildren;
      }
    
      return currentNode;
    }
    

    This loop will look for children of the current node to the right of the current node. If they exist, it selects the rightmost of the right children and repeats. Once it reaches a node with no children to its right, it returns the current node, as it has found the rightmost node of the tree (or subtree) with startNode as its root.