Is there a way to read data from a pandas DataFrame and construct a tree using anytree?
Parent Child
A A1
A A2
A2 A21
I can do it with static values as follows. However, I want to automate this by reading the data from a pandas DataFrame with anytree.
>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)
Output is
A
├── A1
└── A2
└── A21
This question AND especially the ANSWER has been adopted, copied really, from:
Read data from a file and create a tree using anytree in python
Many thanks to @Fabien N
Create nodes first if not exist, store their references in a dictionary nodes
for further usage. Change parent when necessary for children. We can derive roots of the forest of trees by seeing what Parent
values are not in Child
values, since a parent is not a children of any node it won't appear in Child
column.
def add_nodes(nodes, parent, child):
if parent not in nodes:
nodes[parent] = Node(parent)
if child not in nodes:
nodes[child] = Node(child)
nodes[child].parent = nodes[parent]
data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {} # store references to created nodes
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1) # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots: # you can skip this for roots[0], if there is no forest and just 1 tree
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
Result:
A
├── A1
└── A2
└── A21
B
└── B1
Update printing a specific root:
root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))