pythonpandasdictionaryrecursive-datastructures

Mapping recursively from a dataframe to python dictionary


I am struggling to find a recursive mapping to get the end result. Here is the input df

## mapping recursive
import pandas as pd

data = {
  "group1": ["A", "A", "B", "B"],
  "group2": ["grp1", "grp2", "grp1", "grp2"],
  "hc": [50, 40, 45, 90],
  "response": [12, 30, 43, 80]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
df

I would like to map recursively to convert the df into a Python dictionary. Each number is in a details list, and it is recursively mapped through the aggregated data frame. For example, level A total_hc is the sum of hc of group1 is A.

## desired output
output = {
    "rows":[
        {
        "details": [{        
            "level": "A",
            "total_hc": 90,
            "response_total": 42
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 50,
                    "response_total": 12
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 40,
                    "response_total": 30
                }],
                }
            ]
        },
        {
        "details": [{        
            "level": "B",
            "total_hc": 135,
            "response_total": 123
        }],
            "rows":[
                {
                "details": [{        
                    "level": "grp1",
                    "total_hc": 45,
                    "response_total": 43
                }]
                },
                {
                "details": [{        
                    "level": "grp2",
                    "total_hc": 90,
                    "response_total": 80
                }],
                }
            ]
        }
    ]
}

I tried to group the df

## group by function
group_df = df.groupby(["group1", "group2"]).sum()
group_df.to_dict("index")

Then I am struggling to find a recursive mapping to get the end result. Appreciate anyone who can help out.


Solution

  • If the dataframe is acutally that simple then you could try

    new_cols = {c: f"{c}_total" for c in ["hc", "response"]}
    df = df.rename(columns=new_cols)
    val_cols = list(new_cols.values())
    out2 = {
        "rows": [
            {"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
             "rows": [{"details": [record]}
                      for record in sdf[["group2"] + val_cols]
                                    .rename(columns={"group2": "level"})
                                    .to_dict(orient="records")]}
            for key, sdf in df.groupby("group1")
        ]
    }
    

    to get

    {'rows': [{'details': [{'level': 'A', 'hc_total': 90, 'response_total': 42}],
               'rows': [{'details': [{'level': 'grp1',
                                      'hc_total': 50,
                                      'response_total': 12}]},
                        {'details': [{'level': 'grp2',
                                      'hc_total': 40,
                                      'response_total': 30}]}]},
              {'details': [{'level': 'B', 'hc_total': 135, 'response_total': 123}],
               'rows': [{'details': [{'level': 'grp1',
                                      'hc_total': 45,
                                      'response_total': 43}]},
                        {'details': [{'level': 'grp2',
                                      'hc_total': 90,
                                      'response_total': 80}]}]}]}
    

    If the dataframe has more columns to group over (not only group1 and group2) then it might be a good idea to use a recursive function like:

    def grouping(df, by, val_cols, start=True):
        if start:
            new_cols = {c: f"{c}_total" for c in val_cols}
            df = df.rename(columns=new_cols)
            val_cols = list(new_cols.values())
        if len(by) == 1:
            df = df[[by[0]] + val_cols].rename(columns={by[0]: "level"})
            return [{"details": [record]} for record in df.to_dict(orient="records")]
        return [{"details": [{"level": key} | sdf[val_cols].sum().to_dict()],
                 "rows": grouping(sdf, by=by[1:], val_cols=val_cols, start=False)}
                for key, sdf in df.groupby(by[0])]
    
    out = {"rows": grouping(df, ["group1", "group2"], ["hc", "response"])}
    

    In case you also want to group by group2 or the right-most item of by in the function (it's not clear to me from your sample) you have to make the following adjustment:

             ...
             "rows": [{"details": [record]}
                      for record in sdf[["group2"] + val_cols]
                                    .groupby("group2", as_index=False).sum()
                                    .rename(columns={"group2": "level"})
                                    .to_dict(orient="records")]}
        ...
    

    or

        ...
        if len(by) == 1:
            df = (df[[by[0]] + val_cols].groupby(by[0], as_index=False).sum()
                  .rename(columns={by[0]: "level"}))
            return [{"details": [record]} for record in df.to_dict(orient="records")]
        ...