I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When I used:
from sklearn.metrics import mean_absolute_error
and selected these columns, it gave me an error: "Input contains NaN'.
As an example, I'm trying to do something like this:
from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7, 10]
y_pred = [2.5, np.NaN, 2, 8, np.NaN]
mean_absolute_error(y_true, y_pred)
Is it possible to skip or ignore the rows with NaN?
UPDATE
I was analyzing with my advisor teacher, and we decided that the best is to drop all these NaN values.
If you want to ignore the NaNs, build a mask a perform boolean indexing:
from sklearn.metrics import mean_absolute_error
import numpy as np
y_true = np.array([3, -0.5, 2, 7, 10])
y_pred = np.array([2.5, np.NaN, 2, 8, np.NaN])
m = ~np.isnan(y_pred)
mean_absolute_error(y_true[m], y_pred[m])
Output: 0.5