compressionstoring-datalz77

LZ77: storing format


I started to write a little program that allow to compress a single file using LZ77 compression algorithm. It works fine. Now I'm thinking how to store the data. In LZ77, compressed data consists in a series of triplets. Each triplet has the following format:
<"start reading at n. positions backwards", "go ahead for n. positions", "next character">
What could be a right way to store these triplets? I thought about: <11, 5, 8> bits, then:

This format works quite well in text compression, but it sucks for my purpose (video made of binary images), it also increase size if compared to the original filesize. Do you have any suggestions?


Solution

  • What I think you mean is more like: <go back n, copy k, insert literal byte>.

    You need to look at the statistics of your matches. You are likely getting many literal bytes with zero-length matches. For that case, a good start would be to use a single bit to decide between a match and no match. If the bit is a one, then it is followed by a distance, length, and literal byte. If it is a zero, it is followed by only a literal bytes.

    You can do better still by Huffman coding the literals, lengths, and distances. The lengths and literal could be combined into a single code, as deflate does, to remove even the one bit.