apache-sparklocality-sensitive-hash

Spark implementation for Locality Sensitive Hashing


As part of a project I'm doing for my studies I'm looking for a way to use the hashing function of LSH with Spark. Is there any way to do so?


Solution

  • Try this implementation:

    https://github.com/mrsqueeze/spark-hash

    Quoting from the README, "this implementation was largely based on the algorithm described in chapter 3 of Mining of Massive Datasets" which has a great description of LSH and minhashing.