compressionlz4zstd

Is calculating the max bounds in a compression algorithm necessary?


I've been using a few compression algorithms and before you compress you're apparently supposed to get the maximum bound of the possible compression result with calls such as:

ZSTD_compressBound(source_length);
LZ4_compressBound(source_length);

I'm wondering, since when you actually call the compression function you pass in the maximum bound, so that the called function doesn't overflow the buffer you provide it, is it possible to forego this compressBound call, simply pass in a buffer that's the same size as the original uncompressed size, and pass that into the maximum size. Nobody's interested in anything compressed past a certain size as you don't get any benefits, and the extra call is skipped, and better for performance. Can I do this? I know the answer is it probably depends on the algorithm, but I don't know anything about compression algorithms and was wondering if someone with knowledge on these could explain whether I should be able to do this.

Would the calculation of the max bound be CPU intensive?


Solution

  • Those functions are a convenience for when you want to compress all of the data in a single call. Then you can then allocate an output buffer that will be large enough to hold all of the compressed data.

    For streaming compression, with multiple calls, you don't need the bound. You just keep providing more input and consuming more output until the compression is done.

    Nobody's interested in anything compressed past a certain size as you don't get any benefits

    I have no idea what you're trying to say there.

    is it possible to forego this compressBound call, simply pass in a buffer that's the same size as the original uncompressed size

    Yes you can pass whatever you like. However if you're trying to complete the compression in a single call, then sometimes that will fail because there is not enough output space. This is not at all uncommon, as you might be providing data that is already compressed, and so cannot be compressed further. It will instead expand slighty.

    and the extra call is skipped, and better for performance

    Do you mean the *compressBound() call? There would be no performance gain by skipping that. That function is a very simple calculation on the input size. Typically just a few shifts and additions. You might as well get the bound and allocate a buffer of the correct size. Furthermore, the largest possible expansion for incompressible data is very small, usually just a fraction of a percent and a few bytes.