I want to calculate the k-nearest neighbors using either sklearn, scipy, or numpy but from a rectangular distance matrix that is output from scipy.spatial.distance.cdist
.
I have tried inputting into the kneighbors_graph
and KNeighborsTransformer
with metric="precomputed"
but have not been successful.
How can I achieve this?
from scipy.spatial.distance import cdist
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer
X, _ = make_classification(n_samples=15, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
A = X[:10,:]
B = X[10:,:]
A.shape, B.shape
# ((10, 4), (5, 4))
# Rectangular distance matrix
dist = cdist(A,B)
dist.shape
# (10, 5)
n_neighbors=3
kneighbors_graph(dist, n_neighbors=n_neighbors, metric="precomputed")
# ---------------------------------------------------------------------------
# ValueError Traceback (most recent call last)
# Cell In[165], line 17
# 14 # (10, 5)
# 16 n_neighbors=3
# ---> 17 kneighbors_graph(dist, n_neighbors=n_neighbors, metric="precomputed")
# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_graph.py:117, in kneighbors_graph(X, n_neighbors, mode, metric, p, metric_params, include_self, n_jobs)
# 50 """Compute the (weighted) graph of k-Neighbors for points in X.
# 51
# 52 Read more in the :ref:`User Guide <unsupervised_neighbors>`.
# (...)
# 114 [1., 0., 1.]])
# 115 """
# 116 if not isinstance(X, KNeighborsMixin):
# --> 117 X = NearestNeighbors(
# 118 n_neighbors=n_neighbors,
# 119 metric=metric,
# 120 p=p,
# 121 metric_params=metric_params,
# 122 n_jobs=n_jobs,
# 123 ).fit(X)
# 124 else:
# 125 _check_params(X, metric, p, metric_params)
# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_unsupervised.py:176, in NearestNeighbors.fit(self, X, y)
# 159 """Fit the nearest neighbors estimator from the training dataset.
# 160
# 161 Parameters
# (...)
# 173 The fitted nearest neighbors estimator.
# 174 """
# 175 self._validate_params()
# --> 176 return self._fit(X)
# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_base.py:545, in NeighborsBase._fit(self, X, y)
# 543 # Precomputed matrix X must be squared
# 544 if X.shape[0] != X.shape[1]:
# --> 545 raise ValueError(
# 546 "Precomputed matrix must be square."
# 547 " Input is a {}x{} matrix.".format(X.shape[0], X.shape[1])
# 548 )
# 549 self.n_features_in_ = X.shape[1]
# 551 n_samples = X.shape[0]
# ValueError: Precomputed matrix must be square. Input is a 10x5 matrix.
Both kneighbors_graph
and KNeighborsTransformer
are intended to work for graphs within a single set of points; you are computing neighbors between two distinct arrays.
I'm not aware of any built-in utility for this, but you could create such a graph manually using numpy
and scipy
routines. For example:
import numpy as np
from scipy import sparse
indices = np.argpartition(dist, n_neighbors, axis=1)[:, :n_neighbors]
data = np.ones(dist.shape[0] * n_neighbors)
row = np.repeat(np.arange(dist.shape[0]), n_neighbors)
col = indices.ravel()
graph = sparse.coo_matrix((data, (row, col)), shape=dist.shape).tocsr()
print(graph.todense())
[[1. 0. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[1. 0. 1. 1. 0.]
[1. 0. 1. 1. 0.]
[1. 0. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[1. 0. 1. 1. 0.]
[0. 1. 1. 1. 0.]
[1. 0. 1. 1. 0.]]
Alternatively, if you want to compute this graph more easily without reference to the pre-computed pairwise distance matrix, you can use KNeighborsTransformer
directly:
from sklearn.neighbors import KNeighborsTransformer
knn = KNeighborsTransformer(n_neighbors=n_neighbors).fit(B)
graph = knn.kneighbors_graph(A)