I am trying to implement the Rabin-Karp for finding the substring; and I got stuck at the rolling hash(trying to use the formula suggested in Wikipedia).
#define MOD 1000000007
unsigned long long rolling_hash(const char *str)
{
unsigned long long hash = 0;
size_t str_len = strlen(str);
for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
hash = hash + str[i] * pow(257, k);
// hash = hash % MOD;
}
return hash;
}
int main(void)
{
printf("%llu\n", rolling_hash("TestString"));
printf("%llu\n", rolling_hash("estStringh"));
unsigned long long old = rolling_hash("TestString");
// Add a character to the end
// since the last char in old was multiplied by 1, now multiply it by
// the base and then add the _new_ character to the end
old = old * 257 + 'h';
//old = old % MOD;
// Remove a char from the start
// Simply, remove the hash value of the first character
old = old - 'T' * pow(257, 10);;
printf("\n%llu\n", old);
return 0;
}
The code above works perfectly fine as long as I do not introduce any remainder operations; once I uncomment my %
operations, things break down and the answer I get from the changes over the rolling hash won't equal that which's being printed by the second print.
janisz's answer:
The suggestion to change the hash generator as in janisz's answer got the remainder to work when adding new characters but NOT when removing the old ones.
Note: I am using my own pow
function to work with unsigned long long
Hash genrator code is wrong. It should be
hash = (hash*257 + str[i]) % MOD;
and unncoment old_hash = old_hash % MOD;
. Also change the way you generate new hash from previous
(old_hash - to_delete_char * pow(257, str_len-1)) % MOD;
Take a look at your code. First 2 lines are perfectly good. What happen in the loop. First of all you are doing as much multiplies as you can. In my approach I use Horner scheme of computing hash becouse hash is a polynomial.
Why it works when without modulus and with not. I think it's a coincidence becouse you overflow integer with 8 characters (log(2^64)/log(257) = 8).
Now what's wrong with removing characters. to_delete_char * pow(257, str_len);
should be to_delete_char * pow(257, str_len-1);
index should start from 0 not 1 to mach your generator.
EDIT: I think problem was in pow function. As I wrote above it overflow just with 8 characters. In your example you have 10 so it can't work.
EDIT: It turns out that adding and removing character must be done as a one operation. Probably due to equivalents but I'm not sure.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define MOD 787
unsigned long long pow(int x, int y)
{
unsigned long long ret = 1;
for (int i=0;i<y;i++)
ret = (ret*x)%MOD;
return ret;
}
unsigned long long rolling_hash(const char *str)
{
unsigned long long hash = 0;
size_t str_len = strlen(str);
for(int i = 0, k = str_len -1; i < str_len; i++, k--) {
hash = hash + (str[i] * pow(257, k))%MOD;
hash = hash % MOD;
}
return hash;
}
int main(void)
{
char input[] = "TestString";
printf("Input: %llu\n", rolling_hash(input));
printf("Expected: %llu\n", rolling_hash("estStringh"));
unsigned long long old = rolling_hash(input);
// Add a character to the end
// and Remove a char from the start
unsigned long long h = (input[0] * pow(257, strlen(input)))%MOD;
old = ((old * 257) + 'h' - h) % MOD;
printf("Actual: %llu\n", old);
return 0;
}