Give the following dataframe:
_BETTER _SAME _WORSE ___dataset Metric
0.373802 0.816794 0.568783 Train precision
0.391304 0.865229 0.519324 Train recall
0.382353 0.840314 0.542929 Train f1-score
0.500000 1.000000 0.583333 Val precision
0.333333 1.000000 0.736842 Val recall
0.400000 1.000000 0.651163 Val f1-score
0.000000 0.000000 0.666667 Test precision
0.000000 0.000000 0.500000 Test recall
0.000000 0.000000 0.571429 Test f1-score
would like to add the followings:
_BETTER _SAME _WORSE ___dataset Metric
0.373802 0.816794 0.568783 Train precision
0.391304 0.865229 0.519324 Train recall
0.382353 0.840314 0.542929 Train f1-score
0.500000 1.000000 0.583333 Val precision
0.333333 1.000000 0.736842 Val recall
0.400000 1.000000 0.651163 Val f1-score
0.000000 0.000000 0.666667 Test precision
0.000000 0.000000 0.500000 Test recall
0.000000 0.000000 0.571429 Test f1-score
mean_p_b mean_p_s mean_p_w All precision_avg
mean_r_b mean_r_s mean_r_w All recall_avg
mean_f1_b mean_f1_s mean_f1_w All f1_score_avg
where mean_p_b
mean_p_s
mean_p_w
is obtained by the average of the precision row, w.r.t the three colums, respectively. Likewise the mean_r_b
mean_r_s
mean_r_w
and mean_f1_b
mean_f1_s
mean_f1_w
.
Applying each separately:
df_avg_precision["BETTER"] = (df_train_precision['_BETTER'].values + df_val_precision['_BETTER'].values + df_test_precision['_BETTER'].values)/3
df_avg_precision["Metric"] = "precision_avg"
df_avg_recall["BETTER"] = (df_train_recall['_BETTER'].values +
df_val_recall['_BETTER'].values + df_test_recall['_BETTER'].values)/3
df_avg_recall["Metric"] = "recall_avg"
df_avg_f1["BETTER"] = (df_train_f1['_BETTER'].values +
df_val_f1['_BETTER'].values + df_test_f1['_BETTER'].values)/3
df_avg_f1["Metric"] = "f1_avg"
You can use df.groupby(...).mean(...)
for this
series_metric_mean = df.groupby('Metric').mean(['_BETTER', '_SAME', '_WORSE'])
_BETTER _SAME _WORSE
Metric
f1-score 0.260784 0.613438 0.588507
precision 0.291267 0.605598 0.606261
recall 0.241546 0.621743 0.585389
(to keep f1-score
as last row it needs groupby(..., sort=False)
)
It gives Series
only with _BETTER, _SAME, _WORSE
(and Metric
as index) so it needs to
__dataset
with All
._avg
in Metric
(at this moment it is still as index).DataFrame()
(using reset_index()
- it converts index Metric
to normal column).concat
enate it to original data.series_metric_mean['___dataset'] = 'All'
series_metric_mean.index = series_metric_mean.index + '_avg'
df_metric_mean = series_metric_mean.reset_index()
df = pd.concat([df, df_metric_mean]).reset_index(drop=True)
Full working code with example data as text.
I use io
only to create file-like object so everyone can simply copy and run it
text1 = """_BETTER _SAME _WORSE ___dataset Metric
0.373802 0.816794 0.568783 Train precision
0.391304 0.865229 0.519324 Train recall
0.382353 0.840314 0.542929 Train f1-score
0.500000 1.000000 0.583333 Val precision
0.333333 1.000000 0.736842 Val recall
0.400000 1.000000 0.651163 Val f1-score
0.000000 0.000000 0.666667 Test precision
0.000000 0.000000 0.500000 Test recall
0.000000 0.000000 0.571429 Test f1-score
"""
# example from @Adeva1 answer
text2 = """_BETTER _SAME _WORSE ___dataset Metric
0.568783 0.568783 0.568783 Train precision
0.519324 0.519324 0.519324 Train recall
0.542929 0.542929 0.542929 Train f1-score
0.583333 0.583333 0.583333 Val precision
0.736842 0.736842 0.736842 Val recall
0.651163 0.651163 0.651163 Val f1-score
0.651163 0.651163 0.651163 Test precision
0.500000 0.500000 0.500000 Test recall
0.571429 0.571429 0.571429 Test f1-score
"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text1), sep='\\s+')
#print(df)
series_metric_mean = df.groupby('Metric', sort=False).mean(['_BETTER', '_SAME', '_WORSE'])
series_metric_mean['___dataset'] = 'All'
series_metric_mean.index = series_metric_mean.index + '_avg'
print(series_metric_mean)
df_metric_mean = series_metric_mean.reset_index()
print(df_metric_mean)
df = pd.concat([df, df_metric_mean]).reset_index(drop=True)
print(df)
Result (for text1
)
_BETTER _SAME _WORSE ___dataset
Metric
f1-score_avg 0.260784 0.613438 0.588507 All
precision_avg 0.291267 0.605598 0.606261 All
recall_avg 0.241546 0.621743 0.585389 All
Metric _BETTER _SAME _WORSE ___dataset
0 f1-score_avg 0.260784 0.613438 0.588507 All
1 precision_avg 0.291267 0.605598 0.606261 All
2 recall_avg 0.241546 0.621743 0.585389 All
_BETTER _SAME _WORSE ___dataset Metric
0 0.373802 0.816794 0.568783 Train precision
1 0.391304 0.865229 0.519324 Train recall
2 0.382353 0.840314 0.542929 Train f1-score
3 0.500000 1.000000 0.583333 Val precision
4 0.333333 1.000000 0.736842 Val recall
5 0.400000 1.000000 0.651163 Val f1-score
6 0.000000 0.000000 0.666667 Test precision
7 0.000000 0.000000 0.500000 Test recall
8 0.000000 0.000000 0.571429 Test f1-score
9 0.291267 0.605598 0.606261 All precision_avg
10 0.241546 0.621743 0.585389 All recall_avg
11 0.260784 0.613438 0.588507 All f1-score_avg
Result (for text2
with example from @Adeva1 answer)
_BETTER _SAME _WORSE
Metric
precision 0.601093 0.601093 0.601093
recall 0.585389 0.585389 0.585389
f1-score 0.588507 0.588507 0.588507
_BETTER _SAME _WORSE ___dataset
Metric
precision_avg 0.601093 0.601093 0.601093 All
recall_avg 0.585389 0.585389 0.585389 All
f1-score_avg 0.588507 0.588507 0.588507 All
Metric _BETTER _SAME _WORSE ___dataset
0 precision_avg 0.601093 0.601093 0.601093 All
1 recall_avg 0.585389 0.585389 0.585389 All
2 f1-score_avg 0.588507 0.588507 0.588507 All
_BETTER _SAME _WORSE ___dataset Metric
0 0.568783 0.568783 0.568783 Train precision
1 0.519324 0.519324 0.519324 Train recall
2 0.542929 0.542929 0.542929 Train f1-score
3 0.583333 0.583333 0.583333 Val precision
4 0.736842 0.736842 0.736842 Val recall
5 0.651163 0.651163 0.651163 Val f1-score
6 0.651163 0.651163 0.651163 Test precision
7 0.500000 0.500000 0.500000 Test recall
8 0.571429 0.571429 0.571429 Test f1-score
9 0.601093 0.601093 0.601093 All precision_avg
10 0.585389 0.585389 0.585389 All recall_avg
11 0.588507 0.588507 0.588507 All f1-score_avg
Doc: