pythonadler32

Differences in calculation of adler32 rolling checksum - python


Need a clarification while looking at calculating a running checksum.

Assume I have data like this.

data = 'helloworld'

Assuming a blocksize of 5, I need to calculate running checksum.

>>> zlib.adler32('hello')
103547413
>>> zlib.adler32('ellow')
105316900

According to Python documentation (python version 2.7.2)

zlib.adler32(data[, value])

"Computes a Adler-32 checksum of data. (An Adler-32 checksum is almost as reliable as a CRC32 but can be computed much more quickly.) If value is present, it is used as the starting value of the checksum; otherwise, a fixed default value is used. This allows computing a running checksum over the concatenation of several inputs."

But when I provide something like this,

>>> zlib.adler32('ellow', zlib.adler32('hello'))
383190072

The output is entirely different.

I tried creating a custom function to generate the rolling checksum as defined in the rsync algorithm.

def weakchecksum(data):
    a = 1
    b = 0

    for char in data:
        a += (ord(char)) % MOD_VALUE
        b += a % MOD_VALUE



    return (b << 16) | a



def rolling(checksum, removed, added, block_size):
    a = checksum
    b = (a >> 16) & 0xffff
    a &= 0xffff

    a = (a - ord(removed) + ord(added)) % MOD_VALUE
    b = (b - (block_size * ord(removed)) + a) % MOD_VALUE

    return (b << 16) | a

Here is the values that I get from running these functions

Weak for hello: 103547413
Rolling for ellow: 105382436
Weak for ellow: 105316900

As you can see there is some huge difference in my implementation of rolling checksum and python's, in terms of value.

Where am I going wrong in calculating the rolling checksum? Am I making use of the rolling property of python's adler32 function correctly?


Solution

  • The adler32() function does not provide "rolling". The documentation correctly uses the word "running" (not "rolling"), which means simply that it can compute the adler32 in chunks as opposed to all at once. You need to write your own code to do compute a "rolling" adler32 value, which would be the adler32 of a sliding window over the data.