python-3.xpandasscikit-learnmse

Calculate the rmse (mse) between y_true and multiple y_preds


Suppose we have a data frame df:

    date    y_true  y_pred1 y_pred2
0   2017/1/31   NaN 15.57   NaN
1   2017/2/28   -2.35   15.57   6.64
2   2017/3/31   15.57   6.64    7.61
3   2017/4/30   6.64    7.61    10.28
4   2017/5/31   NaN 7.61    6.34
5   2017/6/30   10.28   6.34    4.88
6   2017/7/31   6.34    4.88    7.91
7   2017/8/31   6.34    7.91    6.26
8   2017/9/30   7.91    6.26    11.51
9   2017/10/31  6.26    11.51   10.73
10  2017/11/30  11.51   10.73   10.65
11  2017/12/31  NaN 32.05   NaN

I want to write a function one_to_multi_rmse to calculate the rmse of y_pred1, y_pred2 and y_true separately, which can return multiple rmse values ​​at a time.

from sklearn.metrics import mean_squared_error

def one_to_multi_rmse(y_true, y_pred):
     rms = mean_squared_error(y_true, y_pred, squared=False)
     ...
     return dir_acc_ratio

one_to_multi_rmse(df['y_true'], df[['y_pred1','y_pred2']])

Out:

[0.76, 0.82] # This is fake data, just to show the format of the returned results

How to achieve this? Note that I only need to calculate the rmse of the rows where y_true, y_preds have valid values.


Solution

  • You can loop over your prediction columns and compute the RMSE for each of them. Append these to a list and return them like so:

    import numpy as np
    
    def one_to_multi_rmse(y_t, y_p):
        rmse = []
        for prediction in list(y_p):
            MSE = np.square(np.subtract(y_t, y_p[prediction])).mean()
            RMSE = np.sqrt(MSE)
            rmse.append(RMSE)
        return rmse
    
    y_true = df['y_true']
    y_pred = df[['y_pred1', 'y_pred2']]
    
    print(one_to_multi_rmse(y_true, y_pred))
    

    Output:

    [7.093233865217378, 4.974860132037215]