I have a question regarding the scoring in gridsearchCV. I have a random forest classifier for which I am hypertuning parameters using gridsearchcv.
cross_val = sklearn.model_selection.RepeatedKFold(n_splits = 5, n_repeats = 5, random_state = 0)
grid_search = sklearn.model_selection.GridSearchCV(RandomForestClassifier(),
param_grid=param_grid, cv = cross_val, scoring='f1_macro')
grid_search.fit(X, y)
When I run this I can get a dataframe with the f1 score for all folds and repeats:
results = grid_search.cv_results_
results = pd.DataFrame(results)
however, since it is interesting for my research to see how well individual classes are classified I would like to know the accuracies per class, just as you can get when running sklearn.metrics.classification_report.
I already tried running the same cross validation separately and getting the classification report for each of the folds. However, the accuracies are slightly different than those found in the scoring table of the grid search cross validation, which I also don't get.
for train, test in grid_search.cv.split(X,y):
# Create subsets of data using K-fold cross validation for each iteration
X_tr, X_t= X[train], X[test]
y_tr, y_t = y[train], y[test]
# Create Random Forest Regressor
model_grid.fit(X_tr, y_tr)
y_pred = model_grid.predict(X_t)
#Calculate accuracy
report_dict = sklearn.metrics.classification_report(y_pred, y_t, output_dict=True)
report = sklearn.metrics.classification_report(y_t, y_pred)
print(report)
If anyone could help me out I would be very grateful! Thanks in advance
Your for
loop seems the correct way to achieve this.
You should get consistent results if you fix the 'randomness' of RandomForestClassifier
by defining a random_state
:
grid_search = sklearn.model_selection.GridSearchCV(RandomForestClassifier(random_state = 0),
param_grid=param_grid, cv = cross_val, scoring='f1_macro')