compressionlzwlz77

Compression ratio of LZW, LZ77 and other easy-to-implement algorithms


I want to compress .txt files that contains dates in yyyy-mm-dd hh:mm:ss format and english words that sometimes tend to be repeated in different lines.
I read some articles about compression algorithm and find out that in my case dictionary based encoding is better than entropy based encoding. Since I want to implement algorithm myself I need something that isn't very complicated. So I paid attention to LZW and LZ77, but can't choose between them, because conclusions of articles I found are contradictory. According to some articles LZW has better compression ratio and according to others leader is LZ77. So the question is which one is most likely will be better in my case? Is there more easy-to-implement algorithms that can be good for my purpose?


Solution

  • LZW is obsolete. Modern, and even pretty old, LZ77 compressors outperform LZW.

    In any case, you are the only one who can answer your question, since only you have examples of the data you want to compress. Simply experiment with various compression methods (zstd, xz, lz4, etc.) on your data and see what combination of compression ratio and speed meets your needs.