I have two dataframes: df1
and df2
. I want to use aggregates to obtain the mean
and std
between the s_values in both dataframes and put those results in a new dataframe called new_df
in df1 =
statistics s_values
year
1999 cigarette use 100
1999 cellphone use 310
1999 internet use 101
1999 alcohol use 100
1999 soda use 215
in df 2 =
statistics s_values
year
1999 cigarette use 156
1999 cellphone use 198
1999 internet use 232
1999 alcohol use 243
1999 soda use 534
The result that I am trying to get would look something like this.
desired output new_df =
statistics difference mean std
year
1999 cigarette use 56 .. ..
1999 cellphone use 112 .. ..
1999 internet use 78 .. ..
1999 alcohol use 143 .. ..
1999 soda use 319 .. ..
I have managed to build a dataframe with a column with the difference in values using the code
new_df = df1.assign(Value=(df1['s_values'] - df2['s_values].abs())
new_df.rename(columns={'s_values':'difference'}, inplace=True)
this gives me this output but I do not know how to add the columns for the aggregate mean and std
statistics difference
year
1999 cigarette use 56
1999 cellphone use 112
1999 internet use 78
1999 alcohol use 143
1999 soda use 319
Any help is much appreciated
If i am understanding you right, you want to join the two dataframes and compute the mean and std dev
Can you try this?
df = df1.merge(df2, on= ['Year', 'statistics'])
df['mean']=df[['difference_x', 'difference_y']].mean(axis=1)
df['std'] = df[['difference_x', 'difference_y']].std(axis=1)
You could also try this if you want a groupby solution as mentioned in your comments
pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).std()
pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).mean()