So i have built a model for a small dataset and since it was a small dataset, i made a Leave-One-out Cross-Validation (LOOCV) check for its accuracy. so in short, i would remove one sample manually, train the model, predict the left out sample and save the prediction and repeat the process for all the samples. then i would use the list of predictions and the actual values to get a RMSE and R2. and today i found out that there was a Scikit-Learn implementation sklearn.model_selection.LeaveOneOut, however, when i tried it, it gave me different results for the RMSE, and refused to use R-squared as accuracy in the LOOCV method (it seems to calculate the accuracy per sample which does not work with R2).
here is a brief example of the code:
from numpy import mean
from numpy import std
from sklearn.datasets import make_blobs
from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
cv = LeaveOneOut()
model = RandomForestRegressor(n_estimators=200, max_depth=6,n_jobs=40, random_state=0)
scores = cross_val_score(model, data2SN, labelCL, scoring='neg_root_mean_squared_error', cv=cv, n_jobs=-1)
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))
my guess is that I'm calculating the RMSE for the whole dataset, while the LOOCV is doing it per sample and eventually i would take the mean and this is what causes the discrepancy between the two codes output, however, when i tried to calculate the RMSE per sample it failed (citing this TypeError: Singleton array 3021.0 cannot be considered a valid collection). so I'm not sure how the RMSE is calculated inside the LOOCV. and I'm not sure to trust my code or just blindly use scikit-learn implementation.
I'm lost at what to do and chatGPT was just confusing as hell, so my human brethren please help
cross_val_score
average the scores across folds, so with cv=LeaveOneOut()
yes, it's computing the score per row (by a model trained on all other rows). With RMSE, that's equivalent to MAE; and R2 will just fail.
You could use cross_val_predict
to get the individual predictions, then score that collection all at once, to reproduce your manual work.