python-3.xzlibbinascii

CRC32 In Python (vs CRC32b)


I am trying to generate some crc32 hashes, but it seems like zlib and binascii use the crc32b algorithm even though their respective functions are simply zlib.crc32 and binascii.crc32. Are there any other python resources for hash generation that I can try? Interestingly, I have previously found that R's 'digest' package also implements crc32b with no mention of crc32.

Some examples of what I mean by CRC32 and CRC32b:

Here you can see both in the dropdown: http://www.md5calc.com/crc32

Here, CRC32b is on the right side: https://hash.online-convert.com/crc32-generator

Here is a php-centered discussion on the distiction: What is the difference between crc32 and crc32b?

Here we can see that python is implemeting CRC32b: How to calculate CRC32 with Python to match online results?

Thank you


Solution

  • What they are calling "crc32" is the CRC-32/BZIP2 in this catalog. What they are calling "crc32b" is the PKZip CRC-32 (ITU V.42), commonly referred to as simply CRC-32, as it is in that catalog. This use of "crc32" and "crc32b" is apparently a notation invented by the PHP authors.

    You can find a set of example hashes on the PHP documentation page for hash(). There the hashes of the string "hello" are calculated, and can be checked against implementations. The catalog I linked uses "123456789" for the checks.

    You can easily calculate the BZIP2 CRC yourself. Here is some C code as an example:

    uint32_t crc32bzip2(uint32_t crc, void const *mem, size_t len) {
        unsigned char const *data = mem;
        if (data == NULL)
            return 0;
        crc = ~crc;
        while (len--) {
            crc ^= (unsigned)(*data++) << 24;
            for (unsigned k = 0; k < 8; k++)
                crc = crc & 0x80000000 ? (crc << 1) ^ 0x4c11db7 : crc << 1;
        }
        crc = ~crc;
        return crc;
    }
    

    If you call that with NULL for the data pointer, it will return the initial value of the CRC, which in this case is zero. Then you can call it with the current CRC and the bytes to update the CRC with, and it will return the resulting CRC.

    A Python version that computes the CRC-32/BZIP2 of the bytes from stdin:

    #!/usr/local/bin/python3
    import sys
    a = bytearray(sys.stdin.buffer.read())
    crc = 0xffffffff
    for x in a:
        crc ^= x << 24;
        for k in range(8):
            crc = (crc << 1) ^ 0x04c11db7 if crc & 0x80000000 else crc << 1
    crc = ~crc
    crc &= 0xffffffff
    print(hex(crc))
    

    crcany will generate more efficient table-based versions (in C) if desired.