pythonrandom-forestcross-validationmetricsleave-one-out

output metrics TP, NP, TN, FN values for leave one out random forest model python


I am running a grid search of leave-one-out for a random forest model. I used f1 score to get the best estimator and score. From here forward, how can I get the precision and recall score so that I can plot the precision-recall curve? X is the sample dataset and y is the target.

from sklearn.ensemble import  RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import LeaveOneOut

RF = RandomForestClassifier()
param_grid = { 
          'n_estimators': [10,20,30,50],
          'criterion': ['gini', 'entropy'],
          'max_depth': [10, 20, 30, None]}

grid_search = GridSearchCV(RF, 
                       param_grid=param_grid, 
                       cv = LeaveOneOut()
                       score='f1_score')

grid_search.fit(X, y)

Solution

  • You can collect the predictions from your model in an array and use it to calculate the data for the precision-recall curve (or any other performance metric you need):

    from sklearn.metrics import precision_recall_curve
    from matplotlib import pyplot as plt
    
    # The code you provided would go here
    # Use the train partition to train the model
    grid_search.fit(Xtrain, ytrain)
    
    # Use the test partition to test the model with unseen data
    ypred = grid_search.predict(Xtest)
    precision, recall, thresholds = precision_recall_curve(ytest, ypred)
    plt.plot(recall, precision)
    

    It is highly recommended you split your dataset and use the majority of it to train the model, and leave some data just for testing the performance. That way, the ability to generalize with unseen data is checked.