pythonpandasaggregates

Use aggregate computations to obtain mean and std between two dataframes


I have two dataframes: df1 and df2. I want to use aggregates to obtain the mean and std between the s_values in both dataframes and put those results in a new dataframe called new_df

in df1 =

         statistics  s_values
year
1999  cigarette use       100
1999  cellphone use       310
1999   internet use       101
1999    alcohol use       100
1999       soda use       215

in df 2 =

         statistics  s_values
year
1999  cigarette use       156
1999  cellphone use       198
1999   internet use       232
1999    alcohol use       243
1999       soda use       534

The result that I am trying to get would look something like this. desired output new_df =

         statistics  difference  mean  std
year
1999  cigarette use     56        ..    ..
1999  cellphone use    112        ..    ..
1999   internet use     78        ..    ..
1999    alcohol use    143        ..    ..
1999       soda use    319        ..    ..

I have managed to build a dataframe with a column with the difference in values using the code

new_df = df1.assign(Value=(df1['s_values'] - df2['s_values].abs())
new_df.rename(columns={'s_values':'difference'}, inplace=True)

this gives me this output but I do not know how to add the columns for the aggregate mean and std

         statistics  difference  
year
1999  cigarette use     56  
1999  cellphone use    112
1999   internet use     78 
1999    alcohol use    143 
1999       soda use    319

Any help is much appreciated


Solution

  • If i am understanding you right, you want to join the two dataframes and compute the mean and std dev

    Can you try this?

    df = df1.merge(df2, on= ['Year', 'statistics'])
    df['mean']=df[['difference_x', 'difference_y']].mean(axis=1)
    df['std'] = df[['difference_x', 'difference_y']].std(axis=1)
    

    You could also try this if you want a groupby solution as mentioned in your comments

    pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).std()

    pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).mean()