calgorithmrabin-karp

c- Karp-Rabin rolling hash - skip and append parts


I need a little help with a particular part of my Karp-Rabin algo. What I am trying to do is to implement both the version with fixed sliding window and with separate append and skip parts. Sliding window works perfectly fine. The problem occurs when I try to split the monolith sliding window into append and skip parts. Append seems to work fine, but skip is the thing that have been causing me a massive headache last couple of days.

PROBLEM - I am sliding through the string that holds a few instances of a pattern subscribing in it. Sliding window detects it, but not the other two.

The idea is that the RH struct holds the pre-computed value of (base ^ window size) mod prime number (b2wmod) so I can delete the leading character of a string. This value changes after all append and skip as window size changes. To decrease the value of b2wmod, the multiplicative inverse is used to not be in a situation of mod deletion (inverse of base mod modulus value) . It is also pre-computed.

Below are the parts of the code I am interested in. I don't post the whole code to not make you read everything, but can upload it if necessary. Multiplicative inverse seems to be computed correctly, but I can upload the code also.

Would appreciate any help! Thank you in advance!

void
append_to_rh(RH rh)
{
    uint64_t hash    = rh->hash;
    uint64_t base    = rh->base;
    uint64_t mod     = rh->mod;
    uint64_t b2wmodm = rh->b2wmodm;
    char     new     = rh->new;
    
    hash             = ( hash * base + new ) % mod;    
    b2wmodm          = ( b2wmodm * base ) % mod;
    
    rh->hash         = hash;
    rh->b2wmodm      = b2wmodm;
}

void
skip(RH rh)
{
    uint64_t hash       = rh->hash;
    uint64_t base       = rh->base;
    uint64_t mod        = rh->mod;
    uint64_t b2wmodm    = rh->b2wmodm;
    uint64_t m_inv      = rh->m_inv;
    char     old        = rh->old;
    uint64_t correction = old * mod;
   
    b2wmodm = ( b2wmodm * (m_inv % mod) ) % mod;
    hash    = ( hash - old * b2wmodm + correction ) % mod;

    rh->hash    = hash;
    rh->b2wmodm = b2wmodm;
}

void
slide_window(RH rh)
{
    uint64_t base    = rh->base;
    uint64_t mod     = rh->mod;
    uint64_t hash    = rh->hash;
    uint64_t b2wmodm = rh->b2wmodm;
    char old = rh->old;
    char new = rh->new;

    hash     = ( hash * base - old * b2wmodm + new ) % mod;
    rh->hash = hash;
}

Solution

  • Your append and skip functions work fine. Here is a sample code I used for testing

    #include <string>
    #include <cassert>
    
    typedef long long ll;
    
    ll hash, b2wmodm, base, inv_base, mod;
    
    // fast exponentiation, for calculating inv_base
    ll exp(ll a, ll b){
        ll ans = 1;
        while (b){
            if (b&1){
                ans *= a;
                ans %= mod;
            }
            a *= a;
            a %= mod;
            b >>= 1;
        }
        return ans;
    }
    
    // calculates expected hash of the string
    ll expected_hash(std::string s) {
        ll result = 0;
        ll multiplier = 1;
        for (int i = s.length()-1; i >= 0; i--) {
            result += s[i] * multiplier % mod;
            result %= mod;
            multiplier = multiplier * base % mod;
        }
        return result;
    }
    
    // same as your append and skip functions
    void append_to_rh(ll newc) {
        hash = (hash * base + newc) % mod;
        b2wmodm = (b2wmodm * base) % mod;
    }
    
    void skip(ll old) {
        ll correction = old * mod;
    
        b2wmodm = (b2wmodm * (inv_base % mod)) % mod;
        hash = (hash - old * b2wmodm + correction) % mod;
    }
    
    int main() {
    
        base = 29;
        mod = 1000000007;
    
        hash = 0;
        b2wmodm = 1;
        inv_base = exp(base, mod-2);
    
        srand(time(nullptr));
        std::string s;
        for (int i = 0; i < 2000; i++) {
            if (i < 1000 || rand()%2) {
                char newchar = rand()%26 + 'a';
                s += newchar;
                append_to_rh(newchar);
                assert(expected_hash(s) == hash);
            } else {
                char oldchar = s[0];
                s = s.substr(1, s.length());
                skip(oldchar);
                assert(expected_hash(s) == hash);
            }
        }
    }
    

    I guess other parts of your code cause problems. Maybe you are trying to skip an empty window, or maybe you use a non-prime mod.