pythonpandasanytree

Read data from a pandas DataFrame and create a tree using anytree in python


Is there a way to read data from a pandas DataFrame and construct a tree using anytree?

Parent Child
A      A1
A      A2
A2     A21

I can do it with static values as follows. However, I want to automate this by reading the data from a pandas DataFrame with anytree.

>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)

Output is

A
├── A1
└── A2
    └── A21

This question AND especially the ANSWER has been adopted, copied really, from:

Read data from a file and create a tree using anytree in python

Many thanks to @Fabien N


Solution

  • Create nodes first if not exist, store their references in a dictionary nodes for further usage. Change parent when necessary for children. We can derive roots of the forest of trees by seeing what Parent values are not in Child values, since a parent is not a children of any node it won't appear in Child column.

    def add_nodes(nodes, parent, child):
        if parent not in nodes:
            nodes[parent] = Node(parent)  
        if child not in nodes:
            nodes[child] = Node(child)
        nodes[child].parent = nodes[parent]
    
    data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
    nodes = {}  # store references to created nodes 
    # data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1)  # 1-liner
    for parent, child in zip(data["Parent"],data["Child"]):
        add_nodes(nodes, parent, child)
    
    roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
    for root in roots:         # you can skip this for roots[0], if there is no forest and just 1 tree
        for pre, _, node in RenderTree(nodes[root]):
            print("%s%s" % (pre, node.name))
    

    Result:

    A
    ├── A1
    └── A2
        └── A21
    B
    └── B1
    

    Update printing a specific root:

    root = 'A' # change according to usecase
    for pre, _, node in RenderTree(nodes[root]):
        print("%s%s" % (pre, node.name))