pythonnumpyscipydistancenearest-neighbor

How to computing k-nearest neighbors from rectangular distance matrix (i.e., scipy.spatial.distance.cdist) in Python?


I want to calculate the k-nearest neighbors using either sklearn, scipy, or numpy but from a rectangular distance matrix that is output from scipy.spatial.distance.cdist.

I have tried inputting into the kneighbors_graph and KNeighborsTransformer with metric="precomputed" but have not been successful.

How can I achieve this?

from scipy.spatial.distance import cdist
from sklearn.datasets import make_classification
from sklearn.neighbors import kneighbors_graph, KNeighborsTransformer

X, _ = make_classification(n_samples=15, n_features=4, n_classes=2, n_clusters_per_class=1, random_state=0)
A = X[:10,:]
B = X[10:,:]
A.shape, B.shape
# ((10, 4), (5, 4))

# Rectangular distance matrix
dist = cdist(A,B)
dist.shape
# (10, 5)

n_neighbors=3
kneighbors_graph(dist, n_neighbors=n_neighbors, metric="precomputed")

# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)
# Cell In[165], line 17
#      14 # (10, 5)
#      16 n_neighbors=3
# ---> 17 kneighbors_graph(dist, n_neighbors=n_neighbors, metric="precomputed")

# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_graph.py:117, in kneighbors_graph(X, n_neighbors, mode, metric, p, metric_params, include_self, n_jobs)
#      50 """Compute the (weighted) graph of k-Neighbors for points in X.
#      51 
#      52 Read more in the :ref:`User Guide <unsupervised_neighbors>`.
#    (...)
#     114        [1., 0., 1.]])
#     115 """
#     116 if not isinstance(X, KNeighborsMixin):
# --> 117     X = NearestNeighbors(
#     118         n_neighbors=n_neighbors,
#     119         metric=metric,
#     120         p=p,
#     121         metric_params=metric_params,
#     122         n_jobs=n_jobs,
#     123     ).fit(X)
#     124 else:
#     125     _check_params(X, metric, p, metric_params)

# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_unsupervised.py:176, in NearestNeighbors.fit(self, X, y)
#     159 """Fit the nearest neighbors estimator from the training dataset.
#     160 
#     161 Parameters
#    (...)
#     173     The fitted nearest neighbors estimator.
#     174 """
#     175 self._validate_params()
# --> 176 return self._fit(X)

# File ~/miniconda3/envs/soothsayer_env/lib/python3.9/site-packages/sklearn/neighbors/_base.py:545, in NeighborsBase._fit(self, X, y)
#     543     # Precomputed matrix X must be squared
#     544     if X.shape[0] != X.shape[1]:
# --> 545         raise ValueError(
#     546             "Precomputed matrix must be square."
#     547             " Input is a {}x{} matrix.".format(X.shape[0], X.shape[1])
#     548         )
#     549     self.n_features_in_ = X.shape[1]
#     551 n_samples = X.shape[0]

# ValueError: Precomputed matrix must be square. Input is a 10x5 matrix.

Solution

  • Both kneighbors_graph and KNeighborsTransformer are intended to work for graphs within a single set of points; you are computing neighbors between two distinct arrays.

    I'm not aware of any built-in utility for this, but you could create such a graph manually using numpy and scipy routines. For example:

    import numpy as np
    from scipy import sparse
    
    indices = np.argpartition(dist, n_neighbors, axis=1)[:, :n_neighbors]
    data = np.ones(dist.shape[0] * n_neighbors)
    row = np.repeat(np.arange(dist.shape[0]), n_neighbors)
    col = indices.ravel()
    graph = sparse.coo_matrix((data, (row, col)), shape=dist.shape).tocsr()
    
    print(graph.todense())
    
    [[1. 0. 1. 1. 0.]
     [0. 1. 1. 1. 0.]
     [1. 0. 1. 1. 0.]
     [1. 0. 1. 1. 0.]
     [1. 0. 1. 1. 0.]
     [0. 1. 1. 1. 0.]
     [0. 1. 1. 1. 0.]
     [1. 0. 1. 1. 0.]
     [0. 1. 1. 1. 0.]
     [1. 0. 1. 1. 0.]]
    

    Alternatively, if you want to compute this graph more easily without reference to the pre-computed pairwise distance matrix, you can use KNeighborsTransformer directly:

    from sklearn.neighbors import KNeighborsTransformer
    knn = KNeighborsTransformer(n_neighbors=n_neighbors).fit(B)
    graph = knn.kneighbors_graph(A)