Suppose we have a data frame df
:
date y_true y_pred1 y_pred2
0 2017/1/31 NaN 15.57 NaN
1 2017/2/28 -2.35 15.57 6.64
2 2017/3/31 15.57 6.64 7.61
3 2017/4/30 6.64 7.61 10.28
4 2017/5/31 NaN 7.61 6.34
5 2017/6/30 10.28 6.34 4.88
6 2017/7/31 6.34 4.88 7.91
7 2017/8/31 6.34 7.91 6.26
8 2017/9/30 7.91 6.26 11.51
9 2017/10/31 6.26 11.51 10.73
10 2017/11/30 11.51 10.73 10.65
11 2017/12/31 NaN 32.05 NaN
I want to write a function one_to_multi_rmse
to calculate the rmse
of y_pred1
, y_pred2
and y_true
separately, which can return multiple rmse
values at a time.
from sklearn.metrics import mean_squared_error
def one_to_multi_rmse(y_true, y_pred):
rms = mean_squared_error(y_true, y_pred, squared=False)
...
return dir_acc_ratio
one_to_multi_rmse(df['y_true'], df[['y_pred1','y_pred2']])
Out:
[0.76, 0.82] # This is fake data, just to show the format of the returned results
How to achieve this? Note that I only need to calculate the rmse
of the rows where y_true
, y_pred
s have valid values.
You can loop over your prediction columns and compute the RMSE for each of them. Append these to a list and return them like so:
import numpy as np
def one_to_multi_rmse(y_t, y_p):
rmse = []
for prediction in list(y_p):
MSE = np.square(np.subtract(y_t, y_p[prediction])).mean()
RMSE = np.sqrt(MSE)
rmse.append(RMSE)
return rmse
y_true = df['y_true']
y_pred = df[['y_pred1', 'y_pred2']]
print(one_to_multi_rmse(y_true, y_pred))
Output:
[7.093233865217378, 4.974860132037215]