My ultimate goal is replace some methods that use kneighbors_graph
with transformers from the sklearn-ann package. All the methods in sklearn-ann
are implemented as sklearn-compatible transformer objects. However, the function I'm trying to replace uses kneighbors_graph(mode="connectivity", include_self=True)
and I'm having a hard time converting the distance output with include_self=False
to this type of connectivity matrix. Not all the transformer objects allow for connectivity mode while including self but all provide access to distance calculations without self.
I'm able to reproduce the kneighbors_graph(mode="connectivity", include_self=True)
from kneighbors_graph(mode="distance", include_self=True)
(referring to as nn_with_self
). However, I'm unable to reproduce it from kneighbors_graph(mode="distance", include_self=False)
(referring to as nn_without_self
) which is the same output as KNeighborsTransformer(mode="distance").fit_transform
.
I see that the nn_without_self
is a super set of nn_with_self
but I don't know how the backend algorithm selects which fields are kept.
How can I recreate nn_with_self
from the nn_without_self
matrix below?
Further, how can I operate on sparse matrices the whole time without converting to dense matrices?
I tried looking at the backend code but it's like an inception of class inheritance and I find myself pouring through several files at the same time losing track on the GitHub.
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3
# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_without_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=False,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors, metric="euclidean", n_jobs=-1).fit_transform(X)
np.all(nn_from_transformer == nn_without_self)
# True
np.all(nn_with_self == nn_without_self)
# False
# Is `nn_with_self` symmetric?
np.allclose(nn_with_self,nn_with_self.T)
# False
# Is `nn_without_self` symmetric?
np.allclose(nn_without_self,nn_without_self.T)
# False
Here are the actual arrays:
nn_with_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 0. , 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0. , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [0. , 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0. , 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
nn_without_self
# matrix([[0. , 0.70550439, 0. , 0.20463097, 1.02852831,
# 0. , 0. , 0. , 0. , 0. ],
# [0.70550439, 0. , 0. , 0.51947869, 0. ,
# 0. , 0. , 0. , 0. , 0.44145655],
# [0. , 0. , 0. , 0. , 0.50025504,
# 0. , 1.10390699, 0. , 0.49481662, 0. ],
# [0.20463097, 0.51947869, 0. , 0. , 0. ,
# 0. , 0. , 0. , 0. , 0.95611187],
# [1.02852831, 0. , 0.50025504, 0. , 0. ,
# 0. , 0. , 0. , 0.34132965, 0. ],
# [0. , 0.88867318, 0. , 1.40547465, 0. ,
# 0. , 0. , 0. , 0. , 0.44956691],
# [0. , 0. , 1.10390699, 0. , 1.52953542,
# 0. , 0. , 0. , 1.59848513, 0. ],
# [0. , 4.1280709 , 0. , 0. , 0. ,
# 3.62670755, 0. , 0. , 0. , 3.83571739],
# [1.36553076, 0. , 0.49481662, 0. , 0.34132965,
# 0. , 0. , 0. , 0. , 0. ],
# [0. , 0.44145655, 0. , 0.95611187, 0. ,
# 0.44956691, 0. , 0. , 0. , 0. ]])
Just do n_neighbors - 1
for include_self=True
.
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
X, _ = make_classification(n_samples=10, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
n_neighbors=3
# Nearest neighbors
nn_with_self = kneighbors_graph(X, n_neighbors=n_neighbors, mode="distance", metric="euclidean", include_self=True,n_jobs=-1).todense()
nn_from_transformer = KNeighborsTransformer(mode="distance", n_neighbors=n_neighbors - 1, metric="euclidean", n_jobs=-1).fit_transform(X).todense()
np.allclose(nn_from_transformer, nn_with_self)
# True