Does anybody know what does the KNNDistancesSampler calculate in ELKI? I can see the java code for the function : https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/KNNDistancesSampler.java, but I am really bad at java - I can see it should get the distance of its neighbors by getKNNDistance()... Is it returning average distance(Euclidean by default) of the k-nearest neighbors of each point? I know it should be used for epsilon estimation of dbscan etc.etc., but I'd also like to know what it is doing... Thank you
References for this are given in the class documentation:
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD '96)Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu
DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN
ACM Trans. Database Systems (TODS)
The class is returning a sample, not just the average, of the kNN distances to help choosing the epsilon parameter using the "elbow" method on that plot. It does not automate choosing this - it only produces the plot.