I have data for 5,000 observations. I split the dataset in two: the variables (X_train
) and the labeled target (y_train
). I am using pyod
because it seems to be the most popular Python library for anomaly detection.
I fit the model to the data with the following code:
from pyod.models.knn import KNN
from pyod.utils import evaluate_print
clf = KNN(n_neighbors=10, method='mean', metric='euclidean')
clf.fit(X_train)
scores = clf.decision_scores_
The model is now fitted and I have the probability of an observation being an outlier stored in scores
. I manually calculated the area under the ROC curve and it returned 0.69.
I noticed this is the same result when using:
evaluate_print('KNN with k=10', y=y_train, y_pred=scores)
Which returns: KNN with k=10 ROC:0.69, precision @ rank n:0.1618
.
I want to know if there is a specific function in pyod
which would return only the 0.69.
I do not know pyod but sklearn
has the roc_auc_score or auc
which does that job. It is very easy to use and I imagine it is a line or two to work with your project.
from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(y_true=y_train, y_score=scores)
auc.append(metrics.auc(fpr, tpr))