parametersdbscanexpectation-maximization

How to use EM algorithms to determine parameters(eps,minpts) of DBSCAN over one dataset?


Recently I choose to use DBSCAN clustering over a public data set. But the parameters Eps and minpts are so sensitive that it's quite hard to get good parameter values with good performance over whole data set. There seems to be over-fitting when tuning the parameters of DBSCAN. I know that EM algorithms can be used to tune parameters of GMM models.I wonder if it's possible to use EM algorithms into DBSCAN. I need some ideas or suggestions about it. Anyone tried it before?


Solution

  • EM algorithms with Gaussian Mixture Models work well, because the GMM is a probabilistic model. It yields a probability for each point, and you know how to infer model parameters to maximize the probabilities.

    I don't think you can meaningfully apply this to flag DBSCAN. There is no "probability" in this connectedness model. Being connected is a binary property, and if you try to maximize this, it will just make everything connected, I.e. epsilon = inifnity.