Can I run LOF with varying k through ELKI so that it is easy to compare which k is the best?
Normally you choose a k, and then you can see the ROCAUC for example. I want to take out the best k for the data set, so I need to compare multiple runs. Can I do that some way easier than manually changing the value for k and doing runs? I want to for example compare all k=[1-100].
Thanks
The Greedy Ensemble shows how to run outlier detection methods for a whole range of k at once efficiently (by only computing the nearest-neighbors once, it will be a lot faster!) using the ComputeKNNOutlierScores
application included with ELKI.
The application EvaluatePrecomputedOutlierScores
can be used to bulk-evaluate these results with multiple measures.
This is what we used for the publication
G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent and M. E. Houle
On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study
Data Mining and Knowledge Discovery 30(4): 891-927, 2016, DOI: 10.1007/s10618-015-0444-8
On the supplementary material website, you can look up the best results for many standard data sets, as well as download the raw results.
But beware that outlier detection quality results tend to be inconclusive. On one data set, one method performs best, on another data set another method. There is no clear winner, because data sets are very diverse.