pythonmatplotlibscikit-learnclassificationconfusion-matrix

How can I achieve a scikit-learn confusion matrix without extra columns?


I have a classifier that does binary classification. For training I take the data I trust. For testing, I take the data I trust and also some not so good data (real world data).

How do I get a confusion matrix without extra columns?

Here is my test code

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

y_test = [0, 1, 1, 1, 2, 2, 3]
predictions = [0, 1, 1, 1, 0, 1, 0]

cm = confusion_matrix(y_test, predictions)

l = ["M", "F", "M?", "F?"]
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=l)
disp.plot()
plt.show()
# I expect a 4x2 matrix here. How can I do it?

See? Unnecessary columns are here

Am I missing something?

Feel free to say that I am completely wrong.


Solution

  • Because you said "... my question ishow to make a 4x2 matrix." on comments I wrote this:

    import seaborn as sns
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.metrics import confusion_matrix
    
    
    y_test = [0, 1, 1, 1, 2, 2, 3]
    predictions = [0, 1, 1, 1, 0, 1, 0]
    unique_labels = sorted(set(y_test + predictions))
    
    # Manually create a 4x2 confusion matrix
    conf_matrix = np.zeros((len(unique_labels), 2), dtype=int)
    
    # Fill in the confusion matrix based on true labels and predictions
    for true_label, pred_label in zip(y_test, predictions):
        conf_matrix[true_label, pred_label] += 1
    
    display_labels = ["M", "F", "M?", "F?"]
    sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["M", "F"], yticklabels=display_labels)
    
    plt.xlabel("Predicted Label")
    plt.ylabel("True Label")
    plt.show()
    

    Here is the result: enter image description here

    However I feel like you want this 2x4 instead (As your question was a bit vague) :

    import seaborn as sns
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.metrics import confusion_matrix
    
    y_test = [0, 1, 1, 1, 2, 2, 3]
    predictions = [0, 1, 1, 1, 0, 1, 0]
    unique_labels = sorted(set(y_test + predictions))
    
    # Manually create a 2x4 confusion matrix
    conf_matrix = np.zeros((len(unique_labels), 2), dtype=int)
    
    # Fill in the confusion matrix based on true labels and predictions
    for true_label, pred_label in zip(y_test, predictions):
        conf_matrix[true_label, pred_label] += 1
    
    display_labels = ["M", "F", "M?", "F?"]
    sns.heatmap(conf_matrix.T, annot=True, fmt="d", cmap="Blues", xticklabels=display_labels, yticklabels=["M", "F"])
    
    plt.xlabel("True Label")
    plt.ylabel("Predicted Label")
    plt.show()
    

    Here is the result: enter image description here