machine-learningcluster-analysisnearest-neighborhyperparametersdbscan

Elbow method for tuning DBSCAN when minPts=1


The elbow method calls for setting k=MinPts, but what do you do when MinPts=1? Is the elbow method still usable in this situation, and if so, how do you determine k?

I tried the elbow method with k=1, which results in all the distances equalling zero.


Solution

  • For the elbow method for dbscan you set k/minPts, which will help you choose a good value for eps. The original DBSCAN paper suggests setting minPts to the dimensionality of the data plus one or higher. So MinPts < 3 makes typically not much sense.

    This is from the man page of dbscan() in the R package dbscan. But this is similar to any other implementation:

    Setting parameters for DBSCAN

    The parameters minPts and eps depend on each other and changing one typically requires changing the other one as well. The original DBSCAN paper suggests to start by setting minPts to the dimensionality of the data plus one or higher. minPts defines the minimum density around a core point (i.e., the minimum density for non-noise areas). Increase the parameter to suppress more noise in the data and require more points to form a cluster. A suitable neighborhood size parameter eps given a fixed value for minPts can be found visually by inspecting the kNNdistplot() of the data using k = minPts - 1 (minPts includes the point itself, while the k-nearest neighbor distance does not). The k-nearest neighbor distance plot sorts all data points by their k-nearest neighbor distance. A sudden increase of the kNN distance (a knee) indicates that the points to the right are most likely outliers. Choose eps for DBSCAN where the knee is.