pythonpandasdataframe

Add a new row as the average of columns


Give the following dataframe:

_BETTER   _SAME     _WORSE    ___dataset     Metric
0.373802  0.816794  0.568783      Train      precision
0.391304  0.865229  0.519324      Train      recall
0.382353  0.840314  0.542929      Train      f1-score
0.500000  1.000000  0.583333      Val        precision
0.333333  1.000000  0.736842      Val        recall
0.400000  1.000000  0.651163      Val        f1-score
0.000000  0.000000  0.666667      Test       precision
0.000000  0.000000  0.500000      Test       recall
0.000000  0.000000  0.571429      Test       f1-score

would like to add the followings:

_BETTER   _SAME     _WORSE    ___dataset     Metric
0.373802  0.816794  0.568783      Train      precision
0.391304  0.865229  0.519324      Train      recall
0.382353  0.840314  0.542929      Train      f1-score
0.500000  1.000000  0.583333      Val        precision
0.333333  1.000000  0.736842      Val        recall
0.400000  1.000000  0.651163      Val        f1-score
0.000000  0.000000  0.666667      Test       precision
0.000000  0.000000  0.500000      Test       recall
0.000000  0.000000  0.571429      Test       f1-score
mean_p_b  mean_p_s  mean_p_w       All       precision_avg
mean_r_b  mean_r_s  mean_r_w       All       recall_avg
mean_f1_b mean_f1_s mean_f1_w      All       f1_score_avg

where mean_p_b mean_p_s mean_p_w is obtained by the average of the precision row, w.r.t the three colums, respectively. Likewise the mean_r_b mean_r_s mean_r_w and mean_f1_b mean_f1_s mean_f1_w.

Applying each separately:

df_avg_precision["BETTER"] = (df_train_precision['_BETTER'].values + df_val_precision['_BETTER'].values + df_test_precision['_BETTER'].values)/3
df_avg_precision["Metric"] = "precision_avg"
df_avg_recall["BETTER"] = (df_train_recall['_BETTER'].values + 
df_val_recall['_BETTER'].values + df_test_recall['_BETTER'].values)/3
df_avg_recall["Metric"] = "recall_avg"
df_avg_f1["BETTER"] = (df_train_f1['_BETTER'].values + 
df_val_f1['_BETTER'].values + df_test_f1['_BETTER'].values)/3
df_avg_f1["Metric"] = "f1_avg"

Solution

  • You can use df.groupby(...).mean(...) for this

    series_metric_mean = df.groupby('Metric').mean(['_BETTER', '_SAME', '_WORSE'])
    
                _BETTER     _SAME    _WORSE
    Metric                                 
    f1-score   0.260784  0.613438  0.588507
    precision  0.291267  0.605598  0.606261
    recall     0.241546  0.621743  0.585389
    

    (to keep f1-score as last row it needs groupby(..., sort=False))


    It gives Series only with _BETTER, _SAME, _WORSE (and Metric as index) so it needs to

    series_metric_mean['___dataset'] = 'All'
    
    series_metric_mean.index = series_metric_mean.index + '_avg'
    
    df_metric_mean = series_metric_mean.reset_index()
    
    df = pd.concat([df, df_metric_mean]).reset_index(drop=True)
    

    Full working code with example data as text.
    I use io only to create file-like object so everyone can simply copy and run it

    text1 = """_BETTER     _SAME    _WORSE   ___dataset     Metric
      0.373802  0.816794  0.568783      Train      precision
      0.391304  0.865229  0.519324      Train      recall
      0.382353  0.840314  0.542929      Train      f1-score
      0.500000  1.000000  0.583333      Val        precision
      0.333333  1.000000  0.736842      Val        recall
      0.400000  1.000000  0.651163      Val        f1-score
      0.000000  0.000000  0.666667      Test       precision
      0.000000  0.000000  0.500000      Test       recall
      0.000000  0.000000  0.571429      Test       f1-score
    """
    
    # example from @Adeva1 answer 
    text2 = """_BETTER     _SAME    _WORSE ___dataset     Metric
    0.568783  0.568783  0.568783    Train  precision
    0.519324  0.519324  0.519324    Train     recall
    0.542929  0.542929  0.542929    Train   f1-score
    0.583333  0.583333  0.583333      Val  precision
    0.736842  0.736842  0.736842      Val     recall
    0.651163  0.651163  0.651163      Val   f1-score
    0.651163  0.651163  0.651163     Test  precision
    0.500000  0.500000  0.500000     Test     recall
    0.571429  0.571429  0.571429     Test   f1-score
    """
    
    import pandas as pd
    import io
    
    df = pd.read_csv(io.StringIO(text1), sep='\\s+')
    #print(df)
    
    series_metric_mean = df.groupby('Metric', sort=False).mean(['_BETTER', '_SAME', '_WORSE'])
    
    series_metric_mean['___dataset'] = 'All'
    series_metric_mean.index = series_metric_mean.index + '_avg'
    print(series_metric_mean)
    
    df_metric_mean = series_metric_mean.reset_index()
    print(df_metric_mean)
    
    df = pd.concat([df, df_metric_mean]).reset_index(drop=True)
    print(df)
    

    Result (for text1)

                    _BETTER     _SAME    _WORSE ___dataset
    Metric                                                
    f1-score_avg   0.260784  0.613438  0.588507        All
    precision_avg  0.291267  0.605598  0.606261        All
    recall_avg     0.241546  0.621743  0.585389        All
    
              Metric   _BETTER     _SAME    _WORSE ___dataset
    0   f1-score_avg  0.260784  0.613438  0.588507        All
    1  precision_avg  0.291267  0.605598  0.606261        All
    2     recall_avg  0.241546  0.621743  0.585389        All
    
         _BETTER     _SAME    _WORSE ___dataset         Metric
    0   0.373802  0.816794  0.568783      Train      precision
    1   0.391304  0.865229  0.519324      Train         recall
    2   0.382353  0.840314  0.542929      Train       f1-score
    3   0.500000  1.000000  0.583333        Val      precision
    4   0.333333  1.000000  0.736842        Val         recall
    5   0.400000  1.000000  0.651163        Val       f1-score
    6   0.000000  0.000000  0.666667       Test      precision
    7   0.000000  0.000000  0.500000       Test         recall
    8   0.000000  0.000000  0.571429       Test       f1-score
    9   0.291267  0.605598  0.606261        All  precision_avg
    10  0.241546  0.621743  0.585389        All     recall_avg
    11  0.260784  0.613438  0.588507        All   f1-score_avg
    

    Result (for text2 with example from @Adeva1 answer)

                _BETTER     _SAME    _WORSE
    Metric                                 
    precision  0.601093  0.601093  0.601093
    recall     0.585389  0.585389  0.585389
    f1-score   0.588507  0.588507  0.588507
    
                    _BETTER     _SAME    _WORSE ___dataset
    Metric                                                
    precision_avg  0.601093  0.601093  0.601093        All
    recall_avg     0.585389  0.585389  0.585389        All
    f1-score_avg   0.588507  0.588507  0.588507        All
    
              Metric   _BETTER     _SAME    _WORSE ___dataset
    0  precision_avg  0.601093  0.601093  0.601093        All
    1     recall_avg  0.585389  0.585389  0.585389        All
    2   f1-score_avg  0.588507  0.588507  0.588507        All
    
         _BETTER     _SAME    _WORSE ___dataset         Metric
    0   0.568783  0.568783  0.568783      Train      precision
    1   0.519324  0.519324  0.519324      Train         recall
    2   0.542929  0.542929  0.542929      Train       f1-score
    3   0.583333  0.583333  0.583333        Val      precision
    4   0.736842  0.736842  0.736842        Val         recall
    5   0.651163  0.651163  0.651163        Val       f1-score
    6   0.651163  0.651163  0.651163       Test      precision
    7   0.500000  0.500000  0.500000       Test         recall
    8   0.571429  0.571429  0.571429       Test       f1-score
    9   0.601093  0.601093  0.601093        All  precision_avg
    10  0.585389  0.585389  0.585389        All     recall_avg
    11  0.588507  0.588507  0.588507        All   f1-score_avg
    

    Doc: