pythonadjacency-matrixdbscan

DBSCAN but grouping points that are mutually close?


I have an adjacency matrix which I am using as my pre detriment distance matrix. Instead of finding all the nearest points of all the nearest points I only want to group points that are all near each-other.

For example:

import numpy as np
from sklearn.cluster import DBSCAN

# distance matrix (cosine calculated)
adj = np.array([
    [1,1,0,1,0,1,1],
    [1,1,0,0,1,1,1],
    [0,0,1,1,1,0,0],
    [1,0,1,1,1,0,0],
    [0,1,1,1,1,1,1],
    [1,1,0,0,1,1,1],
    [1,1,0,0,1,1,1]])

# run through DBSCAN
D_fit = DBSCAN(eps = .99,min_samples=2,metric='precomputed').fit(adj)
print(D_fit.labels_)

Normally DBSCAN would group everything together; [0 0 0 0 0 0 0] however if we were to only group points that were all mutually close: [0 0 1 1 1 2 2] or [0 0 1 1 2 2 2] or [0 0 1 1 1 0 0]... This grouping method is what I am looking for. Is there a tool or package or some way to group points that are all mutually close instead of networked grouping?


Solution

  • This isn't the most efficient way but this is how I got it to work;

    import numpy as np
    import networkx as nx
    import random
    
    #set default / starting values
    d = len(adj[0])
    df_edge= nx.from_numpy_array(adj)
    combos = list(nx.enumerate_all_cliques(nx.Graph(df_edge)))
    cluster = [-1 for _ in range(d)]
    select = random.choice(combos)
    c=0
    
    # set cluster value for starting cluster
    for i in select:
      cluster[i] = c
    c += 1
    
    # continue as long as there are unassigned points
    while -1 in cluster:
      # list everything that is ungrouped
      ind = [i for i, x in enumerate(cluster) if x == -1]
      possible = []
      # combine all lists from cobos that don't contain any grouped points.
      for j in combos:
        if all(ele in ind for ele in j) == True:
          possible.append(j)
      # select at random from possible combinations and set as new select value.
      select = random.choice(possible)
      # update cluster list
      for i in select:
        cluster[i] = c
      c += 1
    return cluster