pythonmachine-learningscikit-learnregressionscoring

Why is negative (MSE or MAS) Scoring parameter like- neg_mean_absolute_error in SKLEARN is considered for regression model evaluation


I an a novice in Machine Learning and while going through the course I came across the "Scoring Parameter". I understood for Regression model evaluation, we consider the negatives of Mean Squared error, mean absolute error etc.

When I wanted to know the reason, I went through SKLearn documentation which says "All scorer objects follow the convention that higher return values are better than lower return values. Thus metrics which measure the distance between the model and the data, like metrics.mean_squared_error, are available as neg_mean_squared_error which return the negated value of the metric."

This explanation does not answer my why's completely and I am confused. So, why is the negatives taken more because logically if the difference in prediction is higher whether -ve or +ve, it makes our models equally bad. Then why is it that scoring parameter is focused on negative differences?


Solution

  • I think there is a slight misunderstanding in the way you understood neg_mean_absolute_error (NMAE). The way in which neg_mean_absolute_error is computed as follows:

    enter image description here

    where N is the total number of data points, Y_i is the true value and Y_i^p is the predicted value.

    Still we equally penalize the model if it predicts higher or lower than the true value, but it is just that we multiply the final result with -1 just to follow the convention that sklearn has set. So if a model gives you a MAE of say 0.55 and another model gives you a MAE of say 0.78, their NMAE values would be flipped as -0.55 and -0.78 and by following the convention of higher the better, we pick the former model which results has a higher NMAE of -0.55.

    You can make a similar argument for MSE.