zlibdeflateinflate

Code length of 15 not being hit in Zlib for 4KB Raw Data


I am verifying a Hardware Design block which does decompression (inflate). The decompressed data output should always be 4 KiB. As test data I am compressing chunks of 4 KiB data at a time using zlib's deflate, and providing that as input to my test. I ran multiple regressions and I am never observing a case where the code length is 15. Do you have any suggestions on how to get that, or why it is not possible?


Solution

  • Here you go:

    eF4F4cGBZdmybbnJijFt7eOR931S/14B////3//7f//v//7v//7vf//73//+97///ffff//9
    999///3333///v379+/fv3///v379+/fv3///v39/f39/f39/f39/f39/f39/f39/f39/f39
    /f39/X6/3+/3+/1+v9/v9/v9fr/f7/f7/X6/3+/3+/1+v9/v9/v9fr/f7/f7vu/7vu/7vu/7
    vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu/7vu9777333nvv
    vffee++9995777333nvvvffee++9995777333nvvvffee++9995777333nvvvffee++99957
    77333nvvvffee++99957793d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d
    3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d3d
    3d3d3d3d3d22bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2
    bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2bdu2
    bdu2bdu2bdu2bdu2bdu2bdu2bdtWVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVFQAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAA8P8BVoseLg==
    

    That is a Base64 encoding of a zlib stream that decompresses to 4096 bytes, and that has 15-bit symbols. It was constructed by generating the Lucas numbers, 2, 1, 3, 4, 7, 11, ..., 521, 843. The initial 2 is decremented to 1, to account for the end-of-block symbol in deflate. Then 15 symbols are emitted with those frequencies. (I chose the lower-case letters a..o, with a appearing 843 times.) That results in a sequence of 2205 bytes, which, with the end-of-block symbol, is the smallest possible input that can result in a 15-bit code. That is less than your 4096, so it is indeed possible to generate the test vector you are looking for.

    I then appended another 1891 a's, to fill it out to 4096 bytes. That does not change the resulting Huffman code. You then take that sequence and compress with zlib using the Huffman-only strategy (Z_HUFFMAN_ONLY in zlib, or pigz -zH), in order to avoid LZ77 compression of the long, repeated strings of symbols.

    If you just want a raw deflate stream, then remove the first two and last four bytes of the zlib stream.