pythonsqltreehierarchical-datatransitive-closure-table

Rendering a tree from a closure table SELECT statement?


[previous question]

I'm trying to add reddit-like comments to an app, and I decided to go with the closure table pattern for database organization. My app database looks somewhat like this:

posts

+----+-------+
| id | title |
+----+-------+
|  1 | Hello |
+----+-------+

comments

+----+-----------+---------+------+
| id | parent_id | post_id | text |
+----+-----------+---------+------+
|  1 |      NULL |       1 | ...  |
|  2 |         1 |       1 | ...  |
|  3 |         2 |       1 | ...  |
|  4 |         3 |       1 | ...  |
|  5 |         3 |       1 | ...  |
|  6 |         5 |       1 | ...  |
|  7 |      NULL |       1 | ...  |
|  8 |         7 |       1 | ...  |
|  9 |         4 |       1 | ...  |
+----+-----------+---------+------+

comment_paths

+-----------+----------+-------+
| parent_id | child_id | depth |
+-----------+----------+-------+
|         1 |        1 |     0 |
|         2 |        2 |     0 |
|         1 |        2 |     1 |
|         3 |        3 |     0 |
|         2 |        3 |     1 |
|         1 |        3 |     2 |
|         4 |        4 |     0 |
|         3 |        4 |     1 |
|         2 |        4 |     2 |
|         1 |        4 |     3 |
          [...snip...]

Right now, I'm running a this query:

SELECT c.*, p.*
FROM comments AS c
JOIN comment_paths AS p
    ON c.id = p.child_id
WHERE p.parent_id IN
    (SELECT c2.id FROM comments AS c2 WHERE c2.parent_id IS NULL AND c2.post_id = 1)

to get a list of the comments based on their post_id. The returned data is:

+------+-------------+-----------+--------+-------------+------------+---------+
| c.id | c.parent_id | c.post_id | c.text | p.parent_id | p.child_id | p.depth |
+------+-------------+-----------+--------+-------------+------------+---------+
|    1 |        NULL |         1 | ...    |           1 |          1 |       0 |
|    2 |           1 |         1 | ...    |           1 |          2 |       1 |
|    3 |           2 |         1 | ...    |           1 |          3 |       2 |
|    4 |           3 |         1 | ...    |           1 |          4 |       3 |
|    5 |           3 |         1 | ...    |           1 |          5 |       3 |
|    6 |           5 |         1 | ...    |           1 |          6 |       4 |
|    9 |           4 |         1 | ...    |           1 |          9 |       4 |
|    7 |        NULL |         1 | ...    |           7 |          7 |       0 |
|    8 |           7 |         1 | ...    |           7 |          8 |       1 |
+------+-------------+-----------+--------+-------------+------------+---------+

This represents the tree:

[1]
 |[2]
 | |[3]
 |   |[4]
 |   | |[9]
 |  [5]
 |   |[6]
[7]
 |[8]

However, I'm struggling to convert the returned data into a Python tree structure. Essentially, my goal is this question and this question in terms of final output (HTML) but I really don't want to resort to recursive SQL statements since I already have the information. I figure some sort of recursion is necessary, as I would like to end up with structure similar to this:

[
  {
    'id': 1,
    ...
    'children': [
                  {
                    'id': 2,
                    ...
                    'children': [ ... ]
                  }
                ]
  },
  {
    'id': 7,
    ...
    'children': [
                {
                  'id': 8,
                  ...
                }
              ]
  }
]

Basically a nested list of dictionaries so I can loop over them using Jinja's recursive loop. Does anyone have an idea?

Thanks!


Edit 2013-04-17

Messing around, I have a "working" solution, although it does a lot of iterations so I don't want to mark it as the answer to this question. The solution I used is:

comment_set = ... # previous query to grab data set

def create_tree(parent):
    parent['children'] = []
    for record in comment_set:
        if record['parent_id'] == parent['id']:
            parent['children'].append(create_tree(record))
    return parent

comment_tree = []
for record in comment_set:
    if record['parent_id'] is None: # if this is the start of a tree
        comment_tree.append(create_tree(record))

It's not ideal because it iterates over the comment_set every time create_tree() is called, which is every record in the set. However, it's the best I have right now. Anyone have any thoughts?


Solution

  • You don't need recursion, you just need to be able to process node objects by reference.

    Here's a code example to create the nested data structure in linear time. Treat this as pseudocode, because I haven't tested this and I'm not fluent in python yet.

    The reason I use two for-loops is that otherwise we'd have to assume nodes from the top of the tree are processed before nodes from deep in the tree. With two loops as shown below, no such assumption is needed.

    for record in comment_set:
        nodes[record['id']] = record
    for record in comment_set:
        if record['parent_id'] in nodes:
            nodes[record['parent_id']]['children'].append(record)
        else
            top = record;
    return top
    

    By the end of that loop:

    This is similar to an example I posted in a past SO post, Turn database result into array: