algorithmhashuniversal-hashing

Universal hashing misunderstanding


I am trying to understand how universal hashing works. It is defined h(x) = [(a*x + b) mod p] mod m where a,b - random numbers, m - size of hash table, x - key, and p - prime number. For example, I have several different keys:

92333
23347
20313

And in order to create a universal hash function I have to to following:

Let a = 10, b = 22, p = 313, m = 100
h(92333) = [(10 * 92333 + 22) mod 313] mod 100 = 2 mod 100 = 2
h(23347) = [(10 * 23347 + 22) mod 313] mod 100 = 307 mod 100 = 7
...

But probably every time I will get number in range from 0 to 99 and there might be lots of collisions.

So my question is: I understood and applied universal hashing correctly?


Solution

  • Assuming that the numbers you're hashing have a uniform distribution, your function is biased toward buckets 0 through 12.

    Assume that the hash operation up to and including the mod 313 operation occurs. The result of that operation gets you a value in the range 0..312. Again, if the result of this operation is even distributed, then take the mod 100 you get the following effect:

     result of       Occurs for these
      mod 100        mod 313 results
    -----------     ------------------
         0           0, 100, 200, 300
         1           1, 101, 201, 301
         2           2, 102, 202, 302
         3           3, 103, 203, 303
         4           4, 104, 204, 304
         5           5, 105, 205, 305
         6           6, 106, 206, 306
         7           7, 107, 207, 307
         8           8, 108, 208, 308
         9           9, 109, 209, 309
        10          10, 110, 210, 310
        11          11, 111, 211, 311
        12          12, 112, 212, 312
        13          13, 113, 213
        14          14, 114, 214
        15          15, 115, 215
    

    Notice how the number of opportunities to get a particular result drop after 12? There's your bias. Here's more evidence of that effect taken from counting the results of hashing the numbers 0 through 5,000,000:

    counts[0]: 63898
    counts[1]: 63896
    counts[2]: 63899
    counts[3]: 63900
    counts[4]: 63896
    counts[5]: 63896
    counts[6]: 63900
    counts[7]: 63896
    counts[8]: 63896
    counts[9]: 63900
    counts[10]: 63898
    counts[11]: 63896
    counts[12]: 63899
    counts[13]: 47925
    counts[14]: 47922
    counts[15]: 47922
    counts[16]: 47925
    
    ... elided similar counts ...
    
    counts[97]: 47922
    counts[98]: 47922
    counts[99]: 47925