The elbow method calls for setting k=MinPts
, but what do you do when MinPts=1
? Is the elbow method still usable in this situation, and if so, how do you determine k
?
I tried the elbow method with k=1
, which results in all the distances equalling zero.
For the elbow method for dbscan you set k/minPts, which will help you choose a good value for eps. The original DBSCAN paper suggests setting minPts to the dimensionality of the data plus one or higher. So MinPts < 3 makes typically not much sense.
This is from the man page of dbscan() in the R package dbscan
. But this is similar to any other implementation:
Setting parameters for DBSCAN
The parameters
minPts
andeps
depend on each other and changing one typically requires changing the other one as well. The original DBSCAN paper suggests to start by settingminPts
to the dimensionality of the data plus one or higher.minPts
defines the minimum density around a core point (i.e., the minimum density for non-noise areas). Increase the parameter to suppress more noise in the data and require more points to form a cluster. A suitable neighborhood size parametereps
given a fixed value forminPts
can be found visually by inspecting thekNNdistplot()
of the data usingk = minPts - 1
(minPts
includes the point itself, while the k-nearest neighbor distance does not). The k-nearest neighbor distance plot sorts all data points by their k-nearest neighbor distance. A sudden increase of the kNN distance (a knee) indicates that the points to the right are most likely outliers. Chooseeps
for DBSCAN where the knee is.