c++xz

LZMA_BUF_ERROR while decompressing byte array using LibLZMA (or xz-utils)


I'm using LibLZMA (aka xz-utils) to decompress binary file produced by apache-commons-compression library. I made sure the Java program outputs XZ binary, and I've tried decompress it using 7zip which says its format is lzma2:23 crc64

This is the decompressor function:

    void decompress_content(const uint8_t *compressed_data, int compressed_length)
    {
        lzma_stream strm = LZMA_STREAM_INIT;
        lzma_ret ret = lzma_stream_decoder(&strm, UINT32_MAX, LZMA_CONCATENATED);

        if (ret != LZMA_OK)
            throw std::runtime_error("Failed to initialize XZ decoder.");

        std::vector<uint8_t> decompressed_data;
        uint8_t outbuf[65536];

        strm.next_in = compressed_data;
        strm.avail_in = compressed_length;

        strm.next_out = outbuf;
        strm.avail_out = sizeof(outbuf);

        try
        {
            while (true)
            {
                ret = lzma_code(&strm, LZMA_RUN);

                // Check if output data was produced
                if (strm.avail_out < sizeof(outbuf))
                {
                    size_t write_size = sizeof(outbuf) - strm.avail_out;
                    decompressed_data.insert(decompressed_data.end(), outbuf, outbuf + write_size);
                    strm.next_out = outbuf;
                    strm.avail_out = sizeof(outbuf);
                }

                if (ret == LZMA_STREAM_END)
                {
                    break;
                }

                if (ret == LZMA_OK)
                {
                    continue;
                }

                if (ret == LZMA_BUF_ERROR && strm.avail_in == 0)
                {
                    // No more input data, and decoder cannot make progress
                    lzma_end(&strm);
                    throw std::runtime_error("Compressed data is truncated or corrupted.");
                }
                // Other errors
                lzma_end(&strm);
                handle_lzma_error(ret);
            }
        }
        catch (...)
        {
            lzma_end(&strm);
            throw;
        }

        lzma_end(&strm);
        parse_entries(decompressed_data);
    }

By executing it, it encountered such error: Compressed data is truncated or corrupted., indicating LZMA_BUF_ERROR.

XZ binary, if needed:

        unsigned char rawData[80] = {
            0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00, 0x00, 0x04, 0xE6, 0xD6, 0xB4, 0x46,
            0x02, 0x00, 0x21, 0x01, 0x16, 0x00, 0x00, 0x00, 0x74, 0x2F, 0xE5, 0xA3,
            0x01, 0x00, 0x17, 0x68, 0x65, 0x6C, 0x6C, 0x6F, 0x5F, 0x77, 0x6F, 0x72,
            0x6C, 0x64, 0x2E, 0x74, 0x78, 0x74, 0x00, 0x00, 0x00, 0x00, 0x04, 0x74,
            0x65, 0x73, 0x74, 0x00, 0x78, 0x83, 0x76, 0x08, 0x82, 0x49, 0x3C, 0x9C,
            0x00, 0x01, 0x30, 0x18, 0x8E, 0x1B, 0xAC, 0xEC, 0x1F, 0xB6, 0xF3, 0x7D,
            0x01, 0x00, 0x00, 0x00, 0x00, 0x04, 0x59, 0x5A
        };

Solution

  • You should stop when LZMA_OK is returned but there is no more data. Also, it may be possible that LZMA_BUF_ERROR is returned even when decompression previously succeeded.

    // ...
    
    if (ret == LZMA_OK && strm.avail_in == 0) {
        break;
    } else if (ret == LZMA_OK) {
        continue;
    }
    
    if (ret == LZMA_BUF_ERROR)
    {
        // decoder cannot make progress
        lzma_end(&strm);
        throw std::runtime_error("Compressed data is truncated or corrupted.");
    }
                
    // ...  
    

    You can see the loop working by setting the output buffer size to 8.

    LZMA_BUF_ERROR

    This error is not fatal. Coding can be continued normally by providing more input and/or more output space, if possible.

    Typically the first call to lzma_code() that can do no progress returns LZMA_OK instead of LZMA_BUF_ERROR. Only the second consecutive call doing no progress will return LZMA_BUF_ERROR. This is intentional.