pythonpandasgroup-byaggregatefeature-engineering

How to apply mode function for some columns using agg method with groupby when aggregating using different functions for each column


I have a dataframe of numerical and categorical columns which I am trying to group by certain columns and aggregate.

I am trying to apply mode function on categorical columns in a pandas dataframe and other statistical functions like sum,min..etc on other columns in the same dataframe.

I am not able to get the mode for certain column.

What I tried so far is:

df_agg = df.sort_values(by=['userId', 'num1','num2'],ascending=False).groupby(['userId', 'cat1', 'cat2']).agg({'num3': 'sum', 'num2': 'sum','num1': 'sum', 'num4': 'sum', 'num5':'sum', 'num6':'sum', 'num7': 'min', 'cat3':'mode', 'cat4':'mode'}).reset_index()

That gives the error: 'SeriesGroupBy' object has no attribute 'mode'.

How can I get mode of these categorical columns in this case?


Solution

  • Use lambda function, because mode should return multiple first values is selected first value:

    ..., 'cat4':lambda x: x.mode().iat[0]}
    

    EDIT: If need all modes:

    ..., 'cat4':lambda x: list(x.mode())).reset_index().explode('cat4')