pythoncluster-analysisdbscan

how to plot a k-distance graph in python


How do I plot (in python) the distance graph for a given value of min-points in DBSCAN???

I am looking for the knee and corresponding epsilon value.

In the sklearn I do not see any method that return such distances.... Am I missing something?


Solution

  • You probably want to use the matrix operations provided by numpy to speed up your distance matrix calculation.

    def k_distances2(x, k):
        dim0 = x.shape[0]
        dim1 = x.shape[1]
        p=-2*x.dot(x.T)+np.sum(x**2, axis=1).T+ np.repeat(np.sum(x**2, axis=1),dim0,axis=0).reshape(dim0,dim0)
        p = np.sqrt(p)
        p.sort(axis=1)
        p=p[:,:k]
        pm= p.flatten()
        pm= np.sort(pm)
        return p, pm
    m, m2= k_distances2(X, 2)
    plt.plot(m2)
    plt.ylabel("k-distances")
    plt.grid(True)
    plt.show()