machine-learningscikit-learnclassificationcross-validation

Classification report with Nested Cross Validation in SKlearn (Average/Individual values)


Is it possible to get classification report from cross_val_score through some workaround? I'm using nested cross-validation and I can get various scores here for a model, however, I would like to see the classification report of the outer loop. Any recommendations?

# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)

I would like to see a classification report here along side the score values. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html


Solution

  • We can define our own scoring function as below:

    from sklearn.metrics import classification_report, accuracy_score, make_scorer
    
    def classification_report_with_accuracy_score(y_true, y_pred):
    
        print classification_report(y_true, y_pred) # print classification report
        return accuracy_score(y_true, y_pred) # return accuracy score
    

    Now, just call cross_val_score with our new scoring function, using make_scorer:

    # Nested CV with parameter optimization
    nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, \
                   scoring=make_scorer(classification_report_with_accuracy_score))
    print nested_score 
    

    It will print the classification report as text at the same time return the nested_score as a number.

    http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html example when run with this new scoring function, the last few lines of the output will be as follows:

    #   precision    recall  f1-score   support    
    #0       1.00      1.00      1.00        14
    #1       1.00      1.00      1.00        14
    #2       1.00      1.00      1.00         9
    
    #avg / total       1.00      1.00      1.00        37
    
    #[ 0.94736842  1.          0.97297297  1. ]
    
    #Average difference of 0.007742 with std. dev. of 0.007688.