I have a dataframe of numerical and categorical columns which I am trying to group by certain columns and aggregate.
I am trying to apply mode function on categorical columns in a pandas dataframe and other statistical functions like sum,min..etc on other columns in the same dataframe.
I am not able to get the mode for certain column.
What I tried so far is:
df_agg = df.sort_values(by=['userId', 'num1','num2'],ascending=False).groupby(['userId', 'cat1', 'cat2']).agg({'num3': 'sum', 'num2': 'sum','num1': 'sum', 'num4': 'sum', 'num5':'sum', 'num6':'sum', 'num7': 'min', 'cat3':'mode', 'cat4':'mode'}).reset_index()
That gives the error: 'SeriesGroupBy' object has no attribute 'mode'.
How can I get mode of these categorical columns in this case?
Use lambda function, because mode
should return multiple first values is selected first value:
..., 'cat4':lambda x: x.mode().iat[0]}
EDIT: If need all modes:
..., 'cat4':lambda x: list(x.mode())).reset_index().explode('cat4')