stringmontecarlorolling-computationrabin-karp

Rabin-Karp: rolling hash computation adds a large prime number to previously computed hash


I think I conceptually understand RabinKarp pattern matching algorithm using rolling hash. While going through a sample implementation here, I find that a large prime number q is added to the previously computed rolling hash.

for (int i = m; i < n; i++) {
            // Remove leading digit, add trailing digit, check for match. 
            txtHash = (txtHash + q - RM*txt.charAt(i-m) % q) % q; //Why +q here?
            txtHash = (txtHash*R + txt.charAt(i)) % q; 

            // match
            int offset = i - m + 1;
            if ((patHash == txtHash) && check(txt, offset))
                return offset;
        }

I am not sure why this is needed. Can I get some help with this?

In my limited testing, I get the same result whether or not q term in included.

Does this have something to do with which version of algorithm (Monte Carlo/Las Vegas) is being implemented?


Solution

  • The +q term is there to avoid dealing with negative numbers.

    We want txtHashto always lie in the interval [0;q[, without this +q it could also be in ]-q;0[.

    This could lead to missing the pattern. e.g if patHash = 0xdead but you compute txtHash = -q+0xdead. Those two values are equal mathematically mod q but different with Java taken % q.