knnelki

ELKI KNNDistancesSampler


Does anybody know what does the KNNDistancesSampler calculate in ELKI? I can see the java code for the function : https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/KNNDistancesSampler.java, but I am really bad at java - I can see it should get the distance of its neighbors by getKNNDistance()... Is it returning average distance(Euclidean by default) of the k-nearest neighbors of each point? I know it should be used for epsilon estimation of dbscan etc.etc., but I'd also like to know what it is doing... Thank you


Solution

  • References for this are given in the class documentation:

    Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu
    A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
    Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD '96)

    Erich Schubert, Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu
    DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN
    ACM Trans. Database Systems (TODS)

    The class is returning a sample, not just the average, of the kNN distances to help choosing the epsilon parameter using the "elbow" method on that plot. It does not automate choosing this - it only produces the plot.