algorithmlocality-sensitive-hashminhash

Implementation of locality-sensitive hashing with min-hash


I have read a lot of tutorials, documents, and pieces of code implementing LSH (locality-sensitive hashing) with min-hash.

LSH tries to find the Jaccard coefficient of two sets by hashing random subsets and aggregating over those. I have looked at implementations in code.google.com but was not able to understand their method as well. I understand the paper Google news personalization: scalable online collaborative filtering, but I fail to understand any of the implementations out there.

Can someone please explain me in simple words how to implement LSH with MinHash?


Solution

  • You want to implement the min-hash algorithm but not LSH per se. Min-hashing is an LSH technique. Thus, LSH, in general, does not approximate the Jaccard coefficient, the particular method of min-hashing does.

    An introduction is given in Mining of Massive Datasets, Chapter 3 by Anand Rajaraman and Jeff Ullman.