javapythonminhash

LSH implementation for finding clusters


Hie guys. I am very new to stack exchange and I am currently doing a research on graph theory.

The set of questions I'm going to ask are very introductory since I'm a beginner level programmer (not acquainted with hashing, buckets, vectors etc data structure wise).

My idea is to take in a dataset of the form (timestamp t, node i, node j) which says that there is an edge between i and j at time t. The idea is to search the neighborhood set of each nodes and hash them. If their "vectors" (I don't understand what that is) hash into the same bucket - they are candidates for cluster formation.

But he problem is I want to do experiments and try to run it. But have no idea how do I implement a hash function, and then bucket them together.

I'm not saying help me out with the code. But a pointer (pseudo code) would be very helpful. Like telling me to initialize a hash table etc etc


Solution

  • A hash code is an integer which is calculated from the properties of whatever it is you want to hash. That number is then used as an index into an array.

    In this case it seems that you want to use the N dimensions of your vector to calculate this hash code. It's up to you to write a function that calculates that hash codes in a way that vectors that should be clustered all get the same hash code.

    Language specific details about hash tables in Java or Python is very easy to find with a web search.