I am looking for the fastest way to read a sequential file from disk. I read in some posts that if I compressed the file using, for example, lz4, I could achieve better performance than read the flat file, because I will minimize the i/o operations.
But when I try this approach, scanning a lz4 compressed file gives me a poor performance than scanning the flat file. I didn't try the lz4demo above, but looking for it, my code is very similar.
I have found this benchmarks: http://skipperkongen.dk/2012/02/28/uncompressed-versus-compressed-read/ http://code.google.com/p/lz4/source/browse/trunk/lz4demo.c?r=75
Is it really possible to improve performance reading a compressed sequential file over an uncompressed one? What am I doing wrong?
Yes, it is possible to improve disk read by using compression.
This effect is most likely to happen if you use a multi-threaded reader : while one thread reads compressed data from disk, the other one decode the previous compressed block within memory.
Considering the speed of LZ4, the decoding operation is likely to finish before the other thread complete reading the next block. This way, you'll achieved a bandwidth improvement, proportional to the compression ratio of the tested file.
Obviously, there are other effects to consider when benchmarking. For example, seek times of HDD are several order of magnitude larger than SSD, and under bad circumstances, it can become the dominant part of the timing, reducing any bandwidth advantage to zero.