python machine-learning scikit-learn cross-validation confusion-matrix

Producing a confusion matrix with cross_validate

I'm trying to figure out how to produce a confusion matrix with cross_validate. I'm able to print out the scores with the code I have so far.

# Instantiating model
model = DecisionTreeClassifier()

#Scores
scoring = {'accuracy' : make_scorer(accuracy_score), 
           'precision' : make_scorer(precision_score),
           'recall' : make_scorer(recall_score), 
           'f1_score' : make_scorer(f1_score)}

# 10-fold cross validation
scores = cross_validate(model, X, y, cv=10, scoring=scoring)

print("Accuracy (Testing):  %0.2f (+/- %0.2f)" % (scores['test_accuracy'].mean(), scores['test_accuracy'].std() * 2))
print("Precision (Testing):  %0.2f (+/- %0.2f)" % (scores['test_precision'].mean(), scores['test_precision'].std() * 2))
print("Recall (Testing):  %0.2f (+/- %0.2f)" % (scores['test_recall'].mean(), scores['test_recall'].std() * 2))
print("F1-Score (Testing):  %0.2f (+/- %0.2f)" % (scores['test_f1_score'].mean(), scores['test_f1_score'].std() * 2))

But I'm trying to get that data into a confusion matrix. I'm able to make a confusion matrix by using cross_val_predict -

y_train_pred = cross_val_predict(model, X, y, cv=10)
confusion_matrix(y, y_train_pred)

Which is great, but since it's doing it's own cross validation, the results won't match up. I'm just looking for a way to produce both with matching results.

Solution

The short answer is you can't.

The idea of Confusion Matrix is evaluate one data using one trained model. And the result is a matrix, not a score like for example accuracy. So you can't calculate the mean or something similar. cross_val_score as name suggests, works only on scores. Confusion matrix is not a score, it is a kind of summary of what happened during evaluation.

cross_val_predict is quiet similar on what are you looking for. This function will split data in K parts. Each part will be tested with the model you obtained with the other parts of the data. All the tested sample will be merged. But be careful with this function; from the docs (emphasis added):

Passing these predictions into an evaluation metric may not be a valid way to measure generalization performance. Results can differ from cross_validate and cross_val_score unless all tests sets have equal size and the metric decomposes over samples.