I am looking for an efficient implementation of LSH in python 3 that uses Euclidean distance.
There is the "in-python" LSHForest
implementation, but it uses cosine distances.
Also, even using this implementation, I didn't find a way to see the content of each of the baskets, e.g., if using LSH for clustering - it only returns a certain number of approximate neighbors within a certain radius. But if I want to see all neighbors, I don't see how it can be done (I do not want to use an arbitrary radius of search and I am really not sure what is the meaning of a very large or infinite radius using this implementation).
Will appreciate any insight. Many thanks.
For software recommendations, please ask here: Software Recommendations.
For how this works, first read my answer and then assume that you ask from the package (I haven't used it) a big k (k
should be the number of Neighbors that the software returns), within a big radius r
. That should return many neighbors, set k = N
, where N
is the number of the points in your dataset and you will get all the neighbors.
If you want to see all the neighbors within a certain bucket, then you have to investigate how many points can a bucket contain and set k
to that number.