cmachine-learninghashmapnearest-neighborlocality-sensitive-hash

How to understand Locality Sensitive Hashing?


I noticed that LSH seems a good way to find similar items with high-dimension properties.

After reading the paper http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf, I'm still confused with those formulas.

Does anyone know a blog or article that explains that the easy way?


Solution

  • The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets. Check Chapter 3 - Finding Similar Items http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf

    Also I recommend the below slide: http://www.cs.jhu.edu/%7Evandurme/papers/VanDurmeLallACL10-slides.pdf . The example in the slide helps me a lot in understanding the hashing for cosine similarity.

    I borrow two slides from Benjamin Van Durme & Ashwin Lall, ACL2010 and try to explain the intuitions of LSH Families for Cosine Distance a bit. enter image description here

    enter image description here

    I have some sample code (just 50 lines) in python here which is using cosine similarity. https://gist.github.com/94a3d425009be0f94751