I am trying to get a pandas dataframe from a nested dictionary that contains lists of children nodes. The depth and amount of nodes is completely uneven. Here I put an example of the dictionary that I am trying to convert to a Pandas Dataframe. My dictionary is much longer, it has more than 600 nodes but it has the same format as this example.
example_dict = {"id": 0, "children":[{"id":4},
{"id": 1, "children":[{"id":2}, {"id":6}, {"id":7}, {"id":8}]},
{"id": 3, "children":[{"id":5}]},
{"id": 9, "children":[{"id":10},
{"id": 11, "children":[{"id":12},
{"id":13},
{"id":14},
{"id":15}]}]},
{"id": 16, "children":[{"id":17},
{"id":18},
{"id":19},
{"id":20},
{"id": 21, "children":[{"id":22},
{"id": 23, "children":[{"id":24}]}]}]}]}
I want to convert it into a dataframe that looks like this:
| | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
|---:|----------:|----------:|:----------|:----------|:----------|
| 0 | 0 | 4 | <NA> | <NA> | <NA> |
| 1 | 0 | 1 | 2 | <NA> | <NA> |
| 2 | 0 | 1 | 6 | <NA> | <NA> |
| 3 | 0 | 1 | 7 | <NA> | <NA> |
| 4 | 0 | 1 | 8 | <NA> | <NA> |
| 5 | 0 | 3 | 5 | <NA> | <NA> |
| 6 | 0 | 9 | 10 | <NA> | <NA> |
| 7 | 0 | 9 | 11 | 12 | <NA> |
| 8 | 0 | 9 | 11 | 13 | <NA> |
| 9 | 0 | 9 | 11 | 14 | <NA> |
| 10 | 0 | 9 | 11 | 15 | <NA> |
| 11 | 0 | 16 | 17 | <NA> | <NA> |
| 12 | 0 | 16 | 18 | <NA> | <NA> |
| 13 | 0 | 16 | 19 | <NA> | <NA> |
| 14 | 0 | 16 | 20 | <NA> | <NA> |
| 15 | 0 | 16 | 21 | 22 | <NA> |
| 16 | 0 | 16 | 21 | 23 | 24 |
The number of columns depends on the dictionary so the solution needs to add as many columns as needed.
I tried the solutions here but I could not get a solution.
Here's a recursive function that will convert your dictionary to a list of sublists. Note that the sublists are of varying lengths, so you may have some post-treatment to do before making a dataframe out of the list.
def dict_to_lists(my_dict):
my_list = []
children = my_dict.get('children')
if children is None:
return [[my_dict['id']]]
for child_dict in children:
my_list.extend([[my_dict['id']] + child_list for child_list in dict_to_lists(child_dict)])
return my_list
On your example dict, it returns:
[
[0, 4],
[0, 1, 2],
[0, 1, 6],
[0, 1, 7],
[0, 1, 8],
[0, 3, 5],
[0, 9, 10],
[0, 9, 11, 12],
[0, 9, 11, 13],
[0, 9, 11, 14],
[0, 9, 11, 15],
[0, 16, 17],
[0, 16, 18],
[0, 16, 19],
[0, 16, 20],
[0, 16, 21, 22],
[0, 16, 21, 23, 24]
]