I'd like to build a pandas dataframe or tuple from an anytree object, where each node has a list attribute of members:
from anytree import Node, RenderTree, find_by_attr
from anytree.exporter import DictExporter
from collections import OrderedDict
import pandas as pd
import numpy as np
tree = Node('T0C0',
n=1000,
tier=0,
members=['A','B','C','D'])
Node('T0C0.T1C0',
parent=find_by_attr(tree, 'T0C0'),
n=400,
tier=1,
members=['B','C'])
Node('T0C0.T1C1',
parent=find_by_attr(tree, 'T0C0'),
n=600,
tier=1,
members=['A','D'])
Node('T0C0.T1C1.T2C0',
parent=find_by_attr(tree, 'T0C0.T1C1'),
n=300,
tier=2,
members=['D'])
Node('T0C0.T1C1.T2C1',
parent=find_by_attr(tree, 'T0C0.T1C1'),
n=300,
tier=2,
members=['A'])
my goal is to produce a dataframe of end-nodes per member, or, even better, tier membership per column like the following:
pd.DataFrame(data=np.array([['T0C0.T1C1.T2C1','T0C0.T1C0','T0C0.T1C0','T0C0.T1C1.T2C0'],
['T0C0','T0C0','T0C0','T0C0'],
['T0C0.T1C1','T0C0.T1C0','T0C0.T1C0','T0C0.T1C1'],
['T0C0.T1C1.T2C1',None,None,'T0C0.T1C1.T2C0']]
),
index=['A','B','C','D'],columns=['EndCluster','tier0','tier1','tier2'])
I've tried exporting to ordereddict and to json and building data frames directly from there, but "children" becomes a column in the resulting dataframe, with ordered dict entries. I cannot find a way to unnest. Thank you for any help!
The answer turned out easier than I thought.
First grab all the end nodes using anytree's findall()
endnodes = anytree.findall(tree, filter_=lambda node: len(node.children)==0)
This returns a list of nodes, easier to work with in this case than anytree's OrderedDict conversion
Finally, populate the dataframe by multiplying member-level attributes by len(member)
members = []
tier = []
endcluster = []
for item in endnodes:
members += item.members
tier += [item.tier] * len(item.members)
endcluster += [item.name] * len(item.members)
endf = pd.DataFrame(index=members)
endf['tier']=tier
endf['endcluster']=endcluster