pythoncluster-analysisembeddingspectral-clustering

Spectral embedding - spectral clustering


I'm trying to perform spectral embedding/clustering using Normalized Cuts. I wrote the following code but I have stuck to a logical bottleneck. What do I have to do after clustering the eigenvectors? I don't know how to form the clusters on my original dataset. (A is my affinity matrix)

D = np.diag(np.sum(A, 0))
D_half_inv = np.diag(1.0 / np.sqrt(np.sum(A, 0)))
M = np.dot(D_half_inv, np.dot((D - A), D_half_inv))
# compute eigenvectors and eigenvalues
(w, v) = np.linalg.eigh(M) 
# renorm eigenvectors to have norm 1
var = len(w)
v1 = np.array(np.zeros((var, var)))
for j in range(var):
    v[:][j] = v[:][j]/np.sqrt(np.sum(A,0))
    v[:][j] = v[:][j]/np.linalg.norm(v1[:][j])
v_trailing = v[:,1:45] #omit the corresponding eigenvector of the smallest eigenvalue     which is 0  and 45 is my embedding dimension
k  = 20 #number of clusters
centroids,idx = kmeans2(v_trailing, k)

After that, i get labels for each eigenvector. But how can i link these labels on my original dataset?


Solution

  • The output mapping to the original dataset corresponds to the indices of the labels in your modified set.

    So if yi is in Cm then the ith entry of A will be in Am

    or to put it another way

    Let C1 ..... CM be the set of clusters generated by clustering the eigenvectors the clusters you want are : A1 ..... AM where Ai= { j | yj element of Ci }