iocompressiongzipzliblz4

Purpose of high-performance compression algorithms besides storage efficiency


While trying to learn from the source code of U++, a C++ based RAD framework, I noticed that it made heavy use of compression/ decompression to read and write data. As far as I had understood, compression provided the advantage of storing data in a more compact manner while still maintaining integrity.

But as I looked more into LZ4 algorithm in general, some sources mentioned that it provides faster read/ write than direct read and write (Unfortunately I am unable to locate said sources any longer). I am wondering why this is that case if it is so, because no matter what, the main data still has to be processed - the compression/ decompression is just another extra step. Even if we consider a basic compression algorithm like Huffman coding, we are still having to examine the original data space either way, so a regular read, for example would do just that. But a compression algorithm would not only have to perform that step, but also then have to process that information further.

How could the presence of extra steps yield faster processing given that both a regular IO operation and compression/ decompression operation seem to be performing the initial data space read.

U++ mainly seems to use the zlib library heavily for writing/ retreiving app related resources. Is this done simply to use space efficiently, or for other reasons as well, like the one mentioned above?


Solution

  • code written and running on a (c)pu operates* on original data space

    (* exceptions apply, code can take compression method into account and work on it, eg. RLE data does not need to be decompressed for every aim to be achieved)

    but from persistent data to processing circuit quite some many intermediate storages exist

    and bandwidth jumps considerably, differently fixed and exponetially growing along the way depending on the "pipeline":

    sd/hdd/ssd -> (_RAM ->) cache memory -> cpu/gpu register (latter having few bandwidth but extreme throughput)

    and depending on whether the pu's are that of the PS5, a "generic" PC, from 2010 or from 2022.

    I currently do not find any sources nor data giving rise to a generalisation of this but from what I remember having compressed data initially moved from a slow webserver(-connection) or hdd to ram or a client PC and then decompressed (entirely or range by range) by the cpu and put back in RAM or cache can be a significant shorter delay until processing the data than otherwise and this in most cases.

    Ofc this depends on the compression ratio, the decompression effort and the pipeline pieces inbetween bandwidth and the overall data transported and the intended processing on it.

    I didn't read anything about compression on the U++ website.

    Decompression is an extra step taking time (think + *=1.01 time) but much more time is saved beforehand with transport (think - *=0.9 time) .