c++calgorithmhashmodular

Is it possible to get the original value of a number, after several multiplications **with overflow**?


Summary: Suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?


In details:

It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?

What I have now as an algorithm:

So this is:

a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 - abcd

c * p ^ 1 + d * p ^ 0 - cd

ab gets:

( 
  ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) -
  ( c * p ^ 1 + d * p ^ 0 ) 
)
/ ( p ^ 2 )
= a * p ^ 1 + b * p ^ 0

And this works, if I don't have an overflow (if p is small number). But if it's not - it's not working.

Is there any trick or something?

P.S. The c++ tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)


Solution

  • Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.


    And there's another way to do this (thanks to e friend of mine (: )

    There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m and a are coprime:

    Euler's theorem for coprime number and modulo

    where φ(m) is Euler's totient function.

    In my case, the m (modulo) is the size of the hash type - 2^32, 2^64, etc. (64bit in my case).
    Well, this means, that we should only find the value of φ(m). But think about that - m == 2 ^ 64 so, that gives us the guarantee that m will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.

    Also, we know that m will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:

    hash_t x = -1;
    x /= 2;
    hash_t a_reverse = fast_pow( a, x );
    

    Well, about 64bit numbers, x is really big number ( 19 digits: 9 223 372 036 854 775 807), but fast_pow is really fast and we could cache the reverse number, in case that we need for more than one query.

    fast_pow is a well-known algorithm:

    hash_t fast_pow( hash_t source, hash_t pow )
    {
        if( 0 == pow )
        {
            return 1;
        }
    
        if( 0 != pow % 2 )
        {
            return source * fast_pow( source, pow - 1 );
        }
        else
        {
            return fast_pow( source * source, pow / 2  );    
        }
    
    }
    

    Addition: for example:

        hash_t base = 2305843009213693951;  // 9th mersenne prime
        hash_t x = 1234567890987654321;
    
        x *= fast_pow( base, 123456789 );   // x * ( base ^ 123456789 )
    
        hash_t y = -1;
        y /= 2;
        hash_t base_reverse = fast_pow( base, y );
    
        x *= fast_pow( base_reverse, 123456789 );   // x * ( base_reverse ^ 123456789 )
        assert( x == 1234567890987654321 ) ;
    

    works perfect and very fast.