pythonscikit-learnyellowbrick

Can we use numpy array confusion matrix in Yellowbrick visualization


I was excited by the machine learning models visualization yellowbric, and wanted to visualize the confusion matrix.

I have obtained the confusion using LOF algorithm using scikit learn (this is not implemented in yellowbrick)

Apparently yellowbrick needs model and it fits itself the model to train and used test to get the outputs and gives us the plot of visualization.

Now, my question is if I already have output can I use yellowbrick for its awesome visualization?

Example:
Let's say I already have confusion matrix

cm = np.array([[56750,   114],
              [   95,     3]])

Can I do something like:

from yellowbrick.classifier import ConfusionMatrix
cm1 = ConfusionMatrix(cm)
cm1.show()

Here is the official example: https://www.scikit-yb.org/en/latest/api/classifier/confusion_matrix.html

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ConfusionMatrix

iris = load_iris()
X = iris.data
y = iris.target
classes = iris.target_names

X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)

model = LogisticRegression(multi_class="auto", solver="liblinear")

iris_cm = ConfusionMatrix(
    model, classes=classes,
    label_encoder={0: 'setosa', 1: 'versicolor', 2: 'virginica'}
)

iris_cm.fit(X_train, y_train)
iris_cm.score(X_test, y_test)

iris_cm.show()

I do not want to fit the model using yellowbrick and get the confusion matrix if I already have it from sklearn.

Is there a way to do this using yellowbrick?


Solution

  • You can pass the fitted model into Yellowbrick. The latest version of Yellowbrick checks whether the model is already fitted and will not modify the data passed to it if a fitted model is passed to the visualizer. Modify your code as follows:

    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split as tts
    from sklearn.linear_model import LogisticRegression
    from yellowbrick.classifier import ConfusionMatrix
    
    iris = load_iris()
    X = iris.data
    y = iris.target
    classes = iris.target_names
    
    X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2)
    
    model = LogisticRegression(multi_class="auto", solver="liblinear")
    model.fit(X_train, y_train)
    
    iris_cm = ConfusionMatrix(
        model, classes=classes,
        label_encoder={0: 'setosa', 1: 'versicolor', 2: 'virginica'}
    )
    
    iris_cm.fit(X_train, y_train)
    iris_cm.score(X_test, y_test)
    
    iris_cm.show()