I have an adjacency matrix which I am using as my pre detriment distance matrix. Instead of finding all the nearest points of all the nearest points I only want to group points that are all near each-other.
For example:
import numpy as np
from sklearn.cluster import DBSCAN
# distance matrix (cosine calculated)
adj = np.array([
[1,1,0,1,0,1,1],
[1,1,0,0,1,1,1],
[0,0,1,1,1,0,0],
[1,0,1,1,1,0,0],
[0,1,1,1,1,1,1],
[1,1,0,0,1,1,1],
[1,1,0,0,1,1,1]])
# run through DBSCAN
D_fit = DBSCAN(eps = .99,min_samples=2,metric='precomputed').fit(adj)
print(D_fit.labels_)
Normally DBSCAN would group everything together; [0 0 0 0 0 0 0]
however if we were to only group points that were all mutually close: [0 0 1 1 1 2 2]
or [0 0 1 1 2 2 2]
or [0 0 1 1 1 0 0]
... This grouping method is what I am looking for. Is there a tool or package or some way to group points that are all mutually close instead of networked grouping?
This isn't the most efficient way but this is how I got it to work;
import numpy as np
import networkx as nx
import random
#set default / starting values
d = len(adj[0])
df_edge= nx.from_numpy_array(adj)
combos = list(nx.enumerate_all_cliques(nx.Graph(df_edge)))
cluster = [-1 for _ in range(d)]
select = random.choice(combos)
c=0
# set cluster value for starting cluster
for i in select:
cluster[i] = c
c += 1
# continue as long as there are unassigned points
while -1 in cluster:
# list everything that is ungrouped
ind = [i for i, x in enumerate(cluster) if x == -1]
possible = []
# combine all lists from cobos that don't contain any grouped points.
for j in combos:
if all(ele in ind for ele in j) == True:
possible.append(j)
# select at random from possible combinations and set as new select value.
select = random.choice(possible)
# update cluster list
for i in select:
cluster[i] = c
c += 1
return cluster