compressionzlibdeflate

(Deflate) Is using a code of 284 for a distance pair with a length of 258 allowed in a Huffman compressed Deflate block?


I've been working on a program which aims to optimise deflate compressed files through trying possibly more efficient ways of representing the compressed data, and I recently found a file which uses an uncommon way to represent a distance pair with a length of 258. According to RFC 1951, code 284 is used for distance pairs with lengths 227-257, and code 285 is used for distance pairs with length 258. However, the way that code 284 is encoded allows it to theoretically represent distance pairs with a length of 258, which this file uses. Is this allowed / commonly supported, and is it spec legal?

I've checked Zlib to see if there are notes about this, and I found a comment in trees.c which notes that this is a possible encoding, stating Note that the length 255 (match length 258) can be represented in two different ways: code 284 + 5 bits or code 285..., and puff.c seems to also imply this should be legal, stating Note that 258 can also be coded as the base value 227 plus the maximum extra value of 31. While a good deflate should never do this, it is not an error, and should be decoded properly. I'm not sure if these comments mean this should be spec legal, or well supported outside of Zlib.


Solution

  • I suppose a strict reading of the PKWare Appnote or RFC 1951, which has the row: 284 5 227-257 for that code could be interpreted to mean that it is not permitted to use code 284 to represent length 258. However my decompression code has always allowed it, since it is not explictly disallowed by the specification.

    In any case, a self-respecting deflate compressor should never produce it.