pythonfilecrc32oggopus

Reading an Ogg Opus header to check the crc


I decided to experiment with file formats and I'm using python to read said files. Everything I have extracted from the Ogg header is correct, except the crc check.
The documentation says you must check the entire header and page with the original crc check value set to 0.
I'm wondering what steps I'm missing to get the expected result.

import zlib
import struct

with open("sample3.opus", "rb") as f_:
    file_data = f_.read()


cp, ssv, htf, agp, ssn, psn, pc, ps = struct.unpack_from("<4sBBQIIIB", file_data, 0)
offset = struct.calcsize("<4sBBQIIIB")
segments = struct.unpack_from(f"<{ps}B", file_data, offset)

packet_size = 0

for num in segments:
    packet_size += num

header_size = offset + len(segments) + packet_size


# Copying the entire packet then changing the crc to 0.
header_copy = bytearray()
header_copy.extend(file_data[0:header_size])
struct.pack_into("<I", header_copy, struct.calcsize("<4sBBQII"), 0)

print(pc)
print(zlib.crc32(header_copy))

This script results in:

277013243
752049619

The audio file I'm using:
https://filesamples.com/formats/opus


Solution

  • zlib.crc32() is not the CRC that they specify. They say the initial value and final exclusive-or is zero, whereas for zlib.crc32(), those values are both 0xffffffff. They fail to specify whether their CRC is reflected or not, so you'd need to try both to see which it is.

    Update:

    I checked, and it's a forward CRC. Unfortunately, you can't use zlib.crc32() to calculate it. You can compute it with this:

    def crc32ogg(seq):
        crc = 0
        for b in seq:
            crc ^= b << 24
            for _ in range(8):
                crc = (crc << 1) ^ 0x104c11db7 if crc & 0x80000000 else crc << 1
        return crc