I have a dataset as follows
Unique Name | Parent | Child |
---|---|---|
US_SQ | A | A1 |
UC_LC | A | A2 |
UK_SJ | A2 | A21 |
UI_QQ | B | B1 |
Now I want to set the output as follows:
US_SQ
├── A1
└── UC_LC
└── UK_SJ
UI_QQ
└── B1
In other words, I want to use the Unique name
column value in the tree.
This is the code that I am using:
def add_nodes(nodes, parent, child):
if parent not in nodes:
nodes[parent] = Node(parent)
if child not in nodes:
nodes[child] = Node(child)
nodes[child].parent = nodes[parent]
data = pd.DataFrame(columns=["Parent","Child"], data=[["US_SQ","A","A1"],["UC_LC","A","A2"],["UK_SJ","A2","A21"],["UI_QQ","B","B1"]])
nodes = {} # store references to created nodes
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1) # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots: # you can skip this for roots[0], if there is no forest and just 1 tree
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
Also, is there a way to access the tree data efficiently/ is there any format to save the tree data so that we can easily find the parent/child node easily?
The above data and problem is used from here:
Read data from a pandas DataFrame and create a tree using anytree in python
There are two parts to your question.
1. Renaming the Node
Regarding renaming the node by using Unique Name
as the alias for Parent
name, the above answer on aliasDict is good but we can modify the DataFrame directly instead, leaving your code unchanged.
I have modified your DataFrame because it does not seem to run properly, and your code example does not clearly show that Unique Name
is an alias for Parent
in some cases.
data = pd.DataFrame(
columns=["Unique Name", "Parent", "Child"],
data=[
["US_SQ", "A", "A1"],
["US_SQ", "A", "A2"],
["UC_LC", "A2", "A21"],
["UI_QQ", "B", "B1"]
]
)
# Rename Parent and Child columns using aliasDict
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)
# Your original code - unchanged
nodes = {}
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
2. Exporting to DataFrame
In the second part, anyTree
does not provide integration with pandas DataFrame. An alternative bigtree Python package does this out-of-the-box for you.
The whole code example can be implemented as such,
import pandas as pd
from bigtree import dataframe_to_tree_by_relation, print_tree, tree_to_dataframe
data = pd.DataFrame(
columns=["Unique Name", "Parent", "Child"],
data=[
["root", "root", "A"], # added this line
["root", "root", "B"], # added this line
["US_SQ", "A", "A1"],
["US_SQ", "A", "A2"],
["UC_LC", "A2", "A21"],
["UI_QQ", "B", "B1"]
]
)
# Rename Parent and Child columns using aliasDict (same as above)
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)
# Create a tree from dataframe, print the tree
root = dataframe_to_tree_by_relation(data, parent_col="Parent", child_col="Child")
print_tree(root)
# root
# ├── US_SQ
# │ ├── A1
# │ └── UC_LC
# │ └── A21
# └── UI_QQ
# └── B1
# Export tree to dataframe
tree_to_dataframe(root, parent_col="Parent", name_col="Child")
# path Child Parent
# 0 /root root None
# 1 /root/US_SQ US_SQ root
# 2 /root/US_SQ/A1 A1 US_SQ
# 3 /root/US_SQ/UC_LC UC_LC US_SQ
# 4 /root/US_SQ/UC_LC/A21 A21 UC_LC
# 5 /root/UI_QQ UI_QQ root
# 6 /root/UI_QQ/B1 B1 UI_QQ
Source: I'm the creator of bigtree ;)