solrnearest-neighborn-dimensional

Find n nearest points with Solr in multi-dimensional space


Solr experts, I'd really appreciate some advice on my problem.

I want to build a multi-dimensional space using Solr, let's say with 5 dimensions. In this space, there should be points, e.g.

P1 (0.3, 0.3, 0.3, 0.3, 0.3)
P2 (0.5, 0.5, 0.5, 0.5, 0.1)
P3 (0.5, 0.1, 0.1, 0.1, 0.1)

Now I'd like to find the point that is nearest to a given point, e.g.

Px (0.5, 0.5, 0.5, 0.5, 0.5)

I've tried to find reliable information about multi-dimensional spatial search. But I could not find anything that was of help.

In the Solr Wiki is an article about Spatial Search. But there they are only using 2 dimensions.

So my question is: Does Solr provide the functionality for a multi-dimensional spatial search?


Solution

  • You can use either Principal component analysis or T-distributed Stochastic Neighbor Embedding to reduce your 5-dimensional space to a 2-dimensional representation, and then you can use Solr to find the nearest neighbors for any point on your dataset.

    According to this question, it seems that t-SNE is the most suitable option for your problem.

    There is a Python t-SNE tutorial here but I think this would be enough to solve your problem:

    from sklearn.manifold import TSNE
    X = np.array([ [0.3, 0.3, 0.3, 0.3, 0.3], [0.5, 0.5, 0.5, 0.5, 0.1], [0.5, 0.1, 0.1, 0.1, 0.1], [0.5, 0.5, 0.5, 0.5, 0.5] ])
    reduced_points = TSNE(n_components=2, random_state=0, angle=.99, init='pca').fit_transform(X)
    reduced_points = [ [int(x[0]*100), int(x[1]*100)] for x in reduced_points ]
    

    And then you'll get your points in bidimensional space.

    >>> reduced_points
    [[-21020, 2023], [-12745, -16097], [-2899, 10298], [5375, -7822]]