scikit-learnrandom-forestgridsearchcv

Creating a classification report for each fold in gridsearchCV


I have a question regarding the scoring in gridsearchCV. I have a random forest classifier for which I am hypertuning parameters using gridsearchcv.

cross_val = sklearn.model_selection.RepeatedKFold(n_splits = 5, n_repeats = 5, random_state = 0)
grid_search = sklearn.model_selection.GridSearchCV(RandomForestClassifier(),
                           param_grid=param_grid, cv = cross_val, scoring='f1_macro')
grid_search.fit(X, y)

When I run this I can get a dataframe with the f1 score for all folds and repeats:

results = grid_search.cv_results_
results = pd.DataFrame(results)

enter image description here

however, since it is interesting for my research to see how well individual classes are classified I would like to know the accuracies per class, just as you can get when running sklearn.metrics.classification_report.

enter image description here

I already tried running the same cross validation separately and getting the classification report for each of the folds. However, the accuracies are slightly different than those found in the scoring table of the grid search cross validation, which I also don't get.

for train, test in grid_search.cv.split(X,y):
    
    # Create subsets of data using K-fold cross validation for each iteration   
    X_tr, X_t= X[train], X[test]
    y_tr, y_t = y[train], y[test]

    # Create Random Forest Regressor
    model_grid.fit(X_tr, y_tr)
    y_pred = model_grid.predict(X_t)
    
    #Calculate accuracy
    report_dict = sklearn.metrics.classification_report(y_pred, y_t, output_dict=True)
    report = sklearn.metrics.classification_report(y_t, y_pred)
    print(report)

If anyone could help me out I would be very grateful! Thanks in advance


Solution

  • Your for loop seems the correct way to achieve this.
    You should get consistent results if you fix the 'randomness' of RandomForestClassifier by defining a random_state:

    grid_search = sklearn.model_selection.GridSearchCV(RandomForestClassifier(random_state = 0),
                               param_grid=param_grid, cv = cross_val, scoring='f1_macro')