materialized-path-pattern

The best way to generate path pattern for materialized path tree structures


Browsing through examples all over the web, I can see that people generate the path using something like "parent_id.node_id". Examples:-

uid | name | tree_id
--------------------
 1  | Ali  |   1.
 2  | Abu  |   2.
 3  | Ita  |   1.3.
 4  | Ira  |   1.3.
 5  | Yui  |   1.3.4

But as explained in this question - Sorting tree with a materialized path?, using zero padding to the tree_id make it easy to sort it by the creation order.

uid | name | tree_id
--------------------
 1  | Ali  |   0001.
 2  | Abu  |   0002.
 3  | Ita  |   0001.0003.
 4  | Ira  |   0001.0003.
 5  | Yui  |   0001.0003.0004

Using fix length string like this also make it easy for me to calculate the level - length(tree_id)/5. What I'm worried is it would limit me to maximum 9999 users rather than 9999 per branch. Am I right here ?

9999  | Tar | 0001.9999
10000 | Tor | 0001.??

Solution

  • You are correct -- zero-padding each node ID would allow you to sort the entire tree quite simply. However, you have to make the padding width match the upper limit of digits of the ID field, as you have pointed out in your last example. E.g., if you're using an int unsigned field for your ID, the highest value would be 4,294,967,295. This is ten digits, meaning that the record set from your last example might look like:

    uid   | name | tree_id
    9999  | Tar  | 0000000001.0000009999
    10000 | Tor  | 0000000001.0000010000
    

    As long as you know you're not going to need to change your ID field to bigint unsigned in the future, this will continue work, though it might be a bit data-hungry depending on how huge your tables get. You could shave off two bytes per node ID by storing the values in hexadecimal, which would still be sorted correctly in a string sort:

    uid   | name | tree_id
    9999  | Tar  | 00000001.0000270F
    10000 | Tor  | 00000001.00002710
    

    I can imagine this would make things a real headache when trying to update the paths (pruning nodes, etc) though.

    You can also create extra fields for sorting, e.g.:

    uid   | name | tree_id           | name_sort
    9999  | Tar  | 00000001.0000270F | Ali.Tar
    10000 | Tor  | 00000001.00002710 | Ali.Tor
    

    There are limitations, however, as laid out by this guy's answer to a similar materialized path sorting question. The name field would have to be padded to a set length (fortunately, in your example, each name seems to be three characters long), and it would take up a lot of space.

    In conclusion, given the above issues, I've found that the most versatile way to do sorting like this is to simply do it in your application logic -- say, using a recursive function that builds a nested array, sorting the children of each node as it goes.