algorithmcompressionziprar

How to determine compression method of a ZIP/RAR file


I have a few zip and rar files that I'm working with, and I'm trying to analyze the properties of how each file was compressed (compression level, compression algorithm (e.g. deflate, LZMA, BZip2), dictionary size, word size, etc.), and I haven't figured out a way to do this yet.

Is there any way to analyze the files to determine these properties, with software or otherwise?

Cheers and thanks!


Solution

  • I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:

    python C:\Python27\Scripts\hachoir-wx

    It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.

    For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:

     HEAD_FLAGS      Bit flags: 2 bytes
                     0x10 - information from previous files is used (solid flag)
                     bits 7 6 5 (for RAR 2.0 and later)
                          0 0 0    - dictionary size   64 KB
                          0 0 1    - dictionary size  128 KB
                          0 1 0    - dictionary size  256 KB
                          0 1 1    - dictionary size  512 KB
                          1 0 0    - dictionary size 1024 KB
                          1 0 1    - dictionary size 2048 KB
                          1 1 0    - dictionary size 4096 KB
                          1 1 1    - file is directory
    

    Dictionary size can be found in the WinRAR GUI too.

     METHOD          Packing method 1 byte
                     0x30 - storing
                     0x31 - fastest compression
                     0x32 - fast compression
                     0x33 - normal compression
                     0x34 - good compression
                     0x35 - best compression
    

    And Wikipedia also knows this:

    The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.

    For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:

      general purpose bit flag: (2 bytes)
      compression method: (2 bytes)