
How to understand Locality Sensitive Hashing?

I noticed that LSH seems a good way to find similar items with high-dimension properties.

After reading the paper, I'm still confused with those formulas.

Does anyone know a blog or article that explains that the easy way?


  • The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets. Check Chapter 3 - Finding Similar Items

    Also I recommend the below slide: . The example in the slide helps me a lot in understanding the hashing for cosine similarity.

    I borrow two slides from Benjamin Van Durme & Ashwin Lall, ACL2010 and try to explain the intuitions of LSH Families for Cosine Distance a bit. enter image description here

    enter image description here

    I have some sample code (just 50 lines) in python here which is using cosine similarity.