c++iocompressionzlibminiz

DATA_ERROR when reading zlib/miniz deflated data


I am writing a simple C++ wrapper for miniz-cpp's implementation of zlib compression. I got deflation to work, but now I have a problem with inflating the data again.

Code

I have a test case which (grossly simplified) boils down to:

ByteArray randomData = createRandomData(1024 * 1024);
ByteArray deflatedBytes = deflate(randomData);
writeToTmpFile(deflatedBytes); // for manual review
ByteArray inflatedBytes = inflate(deflatedBytes);

assert(randomData == inflatedBytes);

I am stuck on a DATA_ERROR (-3) which is returned when I inflate my data again. Here is the function where the problem occurs:

// inflates the next <size> bytes and stores them in <out[]>
// stores the actually written amount in <written>
ResultCode Inflator::inflate(uint8_t out[], size_t size, size_t& written)
{
    zStream.next_out = out;
    zStream.avail_out = static_cast<unsigned int>(size);

    // loop until output buffer is completely filled
    while (zStream.avail_out != 0) {
        if (zStream.avail_in == 0) {
            // our Inflator stores a ByteArrayInputStream from which
            // we request more data
            size_t read = iStream.read(in, BUFFER_SIZE);
            if (iStream.err()) {
                return ResultCode::STREAM_ERROR;
            }
            zStream.next_in = in;
            zStream.avail_in = static_cast<unsigned int>(read);
        }

        // THIS IS WHERE WE ACTUALLY CALL INFLATE.
        // RESULT CODE -3 (DATA_ERROR) IS RETURNED AFTER READING
        // ONLY 13 BYTES.
        ResultCode result{mz_inflate(&zStream, Flushing::NONE)};

        if (result == ResultCode::STREAM_END) {
            written = size - zStream.avail_out;
            this->eof_ = true;
            return ResultCode::OK;
        }
        else if (result != ResultCode::OK) {
            return result;
        }
    }

    written = size - zStream.avail_out;
    return ResultCode::OK;
}

My Data

I have verified in a debugger that the data which I read is correct: enter image description here You can see that the zStream which is a mz_stream has data in next_in which is valid zlib-encoded data. At least it starts with 0x78. As I mentioned in my pseudocode, I also dump the data to disk. This data can be read just fine using:

# this command is included in the qpdf package and uncompresses zlib streams
zlib-flate -uncompress < 'mve_deflOutput.zlib' > 'mve_deflOutput.bin'

Here is also a hex dump of the first bytes:

00000000: 7801 a4dd fb7f cfe5 1b07 7072 c8a9 9632  x.........pr...2
00000010: 49ac 9043 9a4c 392c 462c 4d88 8ab0 4a7c  I..C.L9,F,M...J|
00000020: 55d6 2c6d 6921 0931 34ad 4db5 8898 1c26  U.,mi!.14.M....&
00000030: 3a88 ce69 51d9 6a52 94a8 302d d252 6ba6  :..iQ.jR..0-.Rk.
00000040: 84a2 b2ef 9ff0 fce1 be7f dd63 dbe7 f37e  ...........c...~
00000050: dff7 75bd aed7 eb75 5df7 11ac 4358 3763  ..u....u]...CX7c
00000060: b5c0 ea88 550f ab33 d62e aca7 b132 b116  ....U..3.....2..
00000070: 611d c73a 8935 096b 0d56 05d6 8758 a3b1  a..:.5.k.V...X..
00000080: f0ef d7b4 c5c2 bfff f027 2c3c be93 b5b1  .........',<....
00000090: cec2 1a80 7531 5612 d68d 583b b0d6 622d  ....u1V...X;..b-

The Error

For whatever reason, the call to mz_inflate which emulates zlib's inflate returns DATA_ERROR (-3). The total_in field in the zStream gets set to 13, so it looks like only 13 bytes were read before the error happened.

To summarize: If the deflated data is OK and can be extracted using zlib-flate, so why can't miniz read this data? It literally wrote it itself. If there is something wrong with those first 13 bytes, I don't see what it could be.

For reference, here is the full code of the Inflator and the test.


Solution

  • This is an anticlimactic solution, but as it turns out I should have used the actual miniz instead of the single-header mirror which is miniz-cpp. This single-header library is using a severely outdated version of the library from 2017 and simply couldn't read the data correctly.

    When using the actual miniz, the test passed and everything worked. My code was 100% correct, it just used the wrong library.