pythonpandasdictionary

Arbitrary depth tree represented by a nested dictionary of lists to pandas DataFrame


I am trying to get a pandas dataframe from a nested dictionary that contains lists of children nodes. The depth and amount of nodes is completely uneven. Here I put an example of the dictionary that I am trying to convert to a Pandas Dataframe. My dictionary is much longer, it has more than 600 nodes but it has the same format as this example.

example_dict = {"id": 0, "children":[{"id":4},
                                 {"id": 1, "children":[{"id":2}, {"id":6}, {"id":7}, {"id":8}]},
                                 {"id": 3, "children":[{"id":5}]},
                                {"id": 9, "children":[{"id":10},
                                                      {"id": 11, "children":[{"id":12},
                                                                             {"id":13},
                                                                             {"id":14},
                                                                             {"id":15}]}]},
                                {"id": 16, "children":[{"id":17},
                                                       {"id":18},
                                                       {"id":19},
                                                       {"id":20},
                                                       {"id": 21, "children":[{"id":22},
                                                                              {"id": 23, "children":[{"id":24}]}]}]}]}

I want to convert it into a dataframe that looks like this:

|    |   Level 1 |   Level 2 | Level 3   | Level 4   | Level 5   |
|---:|----------:|----------:|:----------|:----------|:----------|
|  0 |         0 |         4 | <NA>      | <NA>      | <NA>      |
|  1 |         0 |         1 | 2         | <NA>      | <NA>      |
|  2 |         0 |         1 | 6         | <NA>      | <NA>      |
|  3 |         0 |         1 | 7         | <NA>      | <NA>      |
|  4 |         0 |         1 | 8         | <NA>      | <NA>      |
|  5 |         0 |         3 | 5         | <NA>      | <NA>      |
|  6 |         0 |         9 | 10        | <NA>      | <NA>      |
|  7 |         0 |         9 | 11        | 12        | <NA>      |
|  8 |         0 |         9 | 11        | 13        | <NA>      |
|  9 |         0 |         9 | 11        | 14        | <NA>      |
| 10 |         0 |         9 | 11        | 15        | <NA>      |
| 11 |         0 |        16 | 17        | <NA>      | <NA>      |
| 12 |         0 |        16 | 18        | <NA>      | <NA>      |
| 13 |         0 |        16 | 19        | <NA>      | <NA>      |
| 14 |         0 |        16 | 20        | <NA>      | <NA>      |
| 15 |         0 |        16 | 21        | 22        | <NA>      |
| 16 |         0 |        16 | 21        | 23        | 24        |

Target Table

The number of columns depends on the dictionary so the solution needs to add as many columns as needed.

I tried the solutions here but I could not get a solution.

Any depth nested dict to pandas dataframe


Solution

  • Here's a recursive function that will convert your dictionary to a list of sublists. Note that the sublists are of varying lengths, so you may have some post-treatment to do before making a dataframe out of the list.

    def dict_to_lists(my_dict):
        my_list = []
        children = my_dict.get('children')
        if children is None:
            return [[my_dict['id']]]
        for child_dict in children:
            my_list.extend([[my_dict['id']] + child_list for child_list in dict_to_lists(child_dict)])
        return my_list
    

    On your example dict, it returns:

    [
        [0, 4], 
        [0, 1, 2], 
        [0, 1, 6], 
        [0, 1, 7], 
        [0, 1, 8], 
        [0, 3, 5], 
        [0, 9, 10], 
        [0, 9, 11, 12], 
        [0, 9, 11, 13], 
        [0, 9, 11, 14], 
        [0, 9, 11, 15], 
        [0, 16, 17], 
        [0, 16, 18], 
        [0, 16, 19], 
        [0, 16, 20], 
        [0, 16, 21, 22], 
        [0, 16, 21, 23, 24]
    ]