hashtypesinteger-hashing

Is it possible to implement universal hashing for the complete range of integers?


I am reading about Universal hashing on integers. The prerequisite and mandatory precondition seems to be that we choose a prime number p greater than the set of all possible keys.

I am not clear on this point.

If our set of keys are of type int then this means that the prime number needs to be of the next bigger data type e.g. long.

But eventually whatever we get as the hash would need to be down-casted to an int to index the hash table. Doesn't this down-casting affect the quality of the Universal Hashing (I am referring to the distribution of the keys over the buckets) somehow?


Solution

  • If our set of keys are integers then this means that the prime number needs to be of the next bigger data type e.g. long.

    That is not a problem. Sometimes it is necessary otherwise the hash family cannot be universal. See below for more information.

    But eventually whatever we get as the hash would need to be down-casted to an int to index the hash table.

    Doesn't this down-casting affect the quality of the Universal Hashing (I am referring to the distribution of the keys over the buckets) somehow?

    The answer is no. I will try to explain.

    Whether p has another data type or not is not important for the hash family to be universal. Important is that p is equal or larger than u (the maximum integer of the universe of integers). It is important that p is big enough (i.e. >= u).

    A hash family is universal when the collision probability is equal or smaller than 1/m.

    So the idea is to hold that constraint.

    The value of p, in theory, can be as big as a long or more. It just needs to be an integer and prime.

    In other terms answering your question: p being a bigger data type is not a problem here and can be even required. p has to be equal or greater than u and a and b have to be randomly chosen integers modulo p, no matter the number of buckets m. With these constraints you can construct a universal hash family.


    Maybe a mathematical example could help

    We could have chosen m = u = 256 buckets. Then we would have a collision probability of 0.003891050584, which is smaller than 1/256 = 0,00390625. Hash family is still universal.

    Let's try with m being bigger than p, e.g. m = 300. Collision probability is 0, which is smaller than 1/300 ~= 0.003333333333. Trivial, we had more buckets than keys. Still universal, no collisions.


    Implementation detail example

    We have the following: