algorithmhashprimesrabin-karp

How does the size of the prime number affect Rabin Karp’s runtime?


From my understanding, the size of P prime to be used for the modulo should be 32\64 bits so the final hash key can be compared in O(1). if we decide to use a prime, larger than m(size of pattern), would that restore us back to naïve run time of O(mn) since wed be comparing in O(m) for n-m iterations?


Solution

  • It might help to take things to extremes.

    Suppose your prime P is the smallest possible prime, 2. Then there are only two possible hash codes that could be computed in the rolling hash, so you’d expect to see matching hashes at about 50% of the indices in the string. That would take your runtime up to Θ(mn) on average if the pattern never exists.

    On the other hand, imagine picking a prime P that is for all intents and purposes infinite (say, a prime with more than 2300 bits in it). That would mean that modding by P would essentially be a no-op, because the rolling hash would never exceed P. In that case, the number of bits in the rolling hash would be roughly on par with the length of the pattern, so the work to update the rolling hash at each step would be Ω(m) for a net runtime of Ω(mn) on average.