I'm looking for the optimal way to compute a hashcode for a set of bi-dimensional points (so that I can store polygons in a hashtable).
There are some obvious ways to do that, such as concatenating all the points coordinates in a string and its hashcode, but this would be very slow.
On the other end of the speed/collision spectrum, I can also for example sum up all the coordinates, which would result in a very fast code, but would also create a lot of collisions.
What's the optimal way to compute a hashcode for a set of points?
Is the optimal solution different if the coordinates are integer (vs real coordinates)?
Edit : I'm using .net so the hashcode should be 32 bits long.
There is no optimal way for this job. It all depends on how big hash can you afford. You have to make tradoffs between speed and diffusion. Keep in mind that there is no such thing as optimal solution (if you do not exactly know what you are going to hash) In some cases xor can be good enough.
Take for instance this code
unsigned int JSHash(char* str, unsigned int len)
{
unsigned int hash = 1315423911;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash ^= ((hash << 5) + (*str) + (hash >> 2));
}
return hash;
}
/* End Of JS Hash Function */
You said that agregating points together is to slow. If you fix upper code it does not need any kind of agregation just pass trought (not much different that sums) And if you are using integeres and floats you would probably fix shifts (<< and >> are shift operations which together works like bitwise rotation) to fit your data type.
Check for other hash functions here: http://www.partow.net/programming/hashfunctions/
Fastes family of hash functions is xxHash