I'm trying to figure out how to produce a confusion matrix with cross_validate. I'm able to print out the scores with the code I have so far.
# Instantiating model
model = DecisionTreeClassifier()
#Scores
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score),
'recall' : make_scorer(recall_score),
'f1_score' : make_scorer(f1_score)}
# 10-fold cross validation
scores = cross_validate(model, X, y, cv=10, scoring=scoring)
print("Accuracy (Testing): %0.2f (+/- %0.2f)" % (scores['test_accuracy'].mean(), scores['test_accuracy'].std() * 2))
print("Precision (Testing): %0.2f (+/- %0.2f)" % (scores['test_precision'].mean(), scores['test_precision'].std() * 2))
print("Recall (Testing): %0.2f (+/- %0.2f)" % (scores['test_recall'].mean(), scores['test_recall'].std() * 2))
print("F1-Score (Testing): %0.2f (+/- %0.2f)" % (scores['test_f1_score'].mean(), scores['test_f1_score'].std() * 2))
But I'm trying to get that data into a confusion matrix. I'm able to make a confusion matrix by using cross_val_predict -
y_train_pred = cross_val_predict(model, X, y, cv=10)
confusion_matrix(y, y_train_pred)
Which is great, but since it's doing it's own cross validation, the results won't match up. I'm just looking for a way to produce both with matching results.
The short answer is you can't.
The idea of Confusion Matrix is evaluate one data using one trained model. And the result is a matrix, not a score like for example accuracy. So you can't calculate the mean or something similar. cross_val_score
as name suggests, works only on scores. Confusion matrix is not a score, it is a kind of summary of what happened during evaluation.
cross_val_predict
is quiet similar on what are you looking for. This function will split data in K parts. Each part will be tested with the model you obtained with the other parts of the data. All the tested sample will be merged. But be careful with this function; from the docs (emphasis added):
Passing these predictions into an evaluation metric may not be a valid way to measure generalization performance. Results can differ from cross_validate and cross_val_score unless all tests sets have equal size and the metric decomposes over samples.