I have a classifier that does binary classification. For training I take the data I trust. For testing, I take the data I trust and also some not so good data (real world data).
How do I get a confusion matrix without extra columns?
Here is my test code
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
y_test = [0, 1, 1, 1, 2, 2, 3]
predictions = [0, 1, 1, 1, 0, 1, 0]
cm = confusion_matrix(y_test, predictions)
l = ["M", "F", "M?", "F?"]
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=l)
disp.plot()
plt.show()
# I expect a 4x2 matrix here. How can I do it?
See? Unnecessary columns are here
Am I missing something?
Because you said "... my question ishow to make a 4x2 matrix." on comments I wrote this:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
y_test = [0, 1, 1, 1, 2, 2, 3]
predictions = [0, 1, 1, 1, 0, 1, 0]
unique_labels = sorted(set(y_test + predictions))
# Manually create a 4x2 confusion matrix
conf_matrix = np.zeros((len(unique_labels), 2), dtype=int)
# Fill in the confusion matrix based on true labels and predictions
for true_label, pred_label in zip(y_test, predictions):
conf_matrix[true_label, pred_label] += 1
display_labels = ["M", "F", "M?", "F?"]
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["M", "F"], yticklabels=display_labels)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
However I feel like you want this 2x4 instead (As your question was a bit vague) :
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
y_test = [0, 1, 1, 1, 2, 2, 3]
predictions = [0, 1, 1, 1, 0, 1, 0]
unique_labels = sorted(set(y_test + predictions))
# Manually create a 2x4 confusion matrix
conf_matrix = np.zeros((len(unique_labels), 2), dtype=int)
# Fill in the confusion matrix based on true labels and predictions
for true_label, pred_label in zip(y_test, predictions):
conf_matrix[true_label, pred_label] += 1
display_labels = ["M", "F", "M?", "F?"]
sns.heatmap(conf_matrix.T, annot=True, fmt="d", cmap="Blues", xticklabels=display_labels, yticklabels=["M", "F"])
plt.xlabel("True Label")
plt.ylabel("Predicted Label")
plt.show()