pythonknnrocauc

Get the area under a ROC curve in python pyod?


I have data for 5,000 observations. I split the dataset in two: the variables (X_train) and the labeled target (y_train). I am using pyod because it seems to be the most popular Python library for anomaly detection.

I fit the model to the data with the following code:

from pyod.models.knn import KNN
from pyod.utils import evaluate_print

clf = KNN(n_neighbors=10, method='mean', metric='euclidean')
clf.fit(X_train)
scores = clf.decision_scores_

The model is now fitted and I have the probability of an observation being an outlier stored in scores. I manually calculated the area under the ROC curve and it returned 0.69.

I noticed this is the same result when using:

evaluate_print('KNN with k=10', y=y_train, y_pred=scores)

Which returns: KNN with k=10 ROC:0.69, precision @ rank n:0.1618.

I want to know if there is a specific function in pyod which would return only the 0.69.


Solution

  • I do not know pyod but sklearn has the roc_auc_score or auc which does that job. It is very easy to use and I imagine it is a line or two to work with your project.

    from sklearn import metrics
    
    fpr, tpr, thresholds = metrics.roc_curve(y_true=y_train, y_score=scores)
    auc.append(metrics.auc(fpr, tpr))