[SOLVED] What's special about TIFF 5.0 style LZW compression

What's special about TIFF 5.0 style LZW compression

I am in the middle of writing a TIFF decoder. The LZW decoder I am using works fine with all the LZW compressed GIF and TIFF images except one which will overflow the buffer of the decoded code string. I tested it with TIFFLZWDecompressor from com.sun.media.imageioimpl.plugins.tiff package and it throws the following exception "java.lang.UnsupportedOperationException: TIFF 5.0-style LZW codes are not supported".

I have been trying to find what is special about the 5.0-style LZW without success. Does anyone have any idea about this?

Note: from TIFFLZWDecompressor source code, the indicator for a TIFF 5.0-style LZW compression is the first two bytes {0x00, 0x01} of the compressed data.

Solution

The TIFF 6.0 spec says:

It is also possible to implement a version of LZW in which the LZW character depth equals BitsPerSample, as described in Draft 2 of Revision 5.0. But there is a major problem with this approach. If BitsPerSample is greater than 11, we can not use 12-bit-maximum codes and the resulting LZW table is unacceptably large.

(TIFF6.pdf, pages 58-59)

It could be this is what they are referring to.

On the other hand... In my own reader I found:

NOTE: This is a spec violation. However, libTiff reads such files. TIFF 6.0 Specification, Section 13: "LZW Compression"/"The Algorithm", page 61, says: LZW compression codes are stored into bytes in high-to-low-order fashion, i.e., FillOrder is assumed to be 1. The compressed codes are written as bytes (not words) so that the compressed data will be identical whether it is an ‘II’ or ‘MM’ file."

The thing about 0x00, 0x01 is actually the "clear code" in "reverse" (ie, following the byte order, rather than ignoring it, as the spec says).