I am struggling to find a recursive mapping to get the end result. Here is the input df
## mapping recursive
import pandas as pd
data = {
"group1": ["A", "A", "B", "B"],
"group2": ["grp1", "grp2", "grp1", "grp2"],
"hc": [50, 40, 45, 90],
"response": [12, 30, 43, 80]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
df
I would like to map recursively to convert the df
into a Python dictionary. Each number is in a details
list, and it is recursively mapped through the aggregated data frame. For example, level A total_hc
is the sum of hc of group1
is A
.
## desired output
output = {
"rows":[
{
"details": [{
"level": "A",
"total_hc": 90,
"response_total": 42
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 50,
"response_total": 12
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 40,
"response_total": 30
}],
}
]
},
{
"details": [{
"level": "B",
"total_hc": 135,
"response_total": 123
}],
"rows":[
{
"details": [{
"level": "grp1",
"total_hc": 45,
"response_total": 43
}]
},
{
"details": [{
"level": "grp2",
"total_hc": 90,
"response_total": 80
}],
}
]
}
]
}
I tried to group the df
## group by function
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")
Then I am struggling to find a recursive mapping to get the end result. Appreciate anyone who can help out.
If the dataframe is acutally that simple then you could try
new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
out2 = {
"rows": [
{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
for key, sdf in df.groupby("group1")
]
}
to get
{'rows': [{'details': [{'level': 'A', 'hc_total': 90, 'response_total': 42}],
'rows': [{'details': [{'level': 'grp1',
'hc_total': 50,
'response_total': 12}]},
{'details': [{'level': 'grp2',
'hc_total': 40,
'response_total': 30}]}]},
{'details': [{'level': 'B', 'hc_total': 135, 'response_total': 123}],
'rows': [{'details': [{'level': 'grp1',
'hc_total': 45,
'response_total': 43}]},
{'details': [{'level': 'grp2',
'hc_total': 90,
'response_total': 80}]}]}]}
If the dataframe has more columns to group over (not only group1
and group2
) then it might be a good idea to use a recursive function like:
def grouping(df, by, val_cols, start=True):
if start:
new_cols = {c: f"{c}_total" for c in val_cols}
df = df.rename(columns=new_cols)
val_cols = list(new_cols.values())
if len(by) == 1:
df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
return [{"details": [record]} for record in df.to_dict(orient="records")]
return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
"rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
for key, sdf in df.groupby(by[0])]
out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}
In case you also want to group by group2
or the right-most item of by
in the function (it's not clear to me from your sample) you have to make the following adjustment:
...
"rows": [{"details": [record]}
for record in sdf[["group2"] + val_cols]
.groupby("group2", as_index=False).sum()
.rename(columns={"group2": "level"})
.to_dict(orient="records")]}
...
or
...
if len(by) == 1:
df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
.rename(columns={by[0]: "level"}))
return [{"details": [record]} for record in df.to_dict(orient="records")]
...