I have a few zip and rar files that I'm working with, and I'm trying to analyze the properties of how each file was compressed (compression level, compression algorithm (e.g. deflate, LZMA, BZip2), dictionary size, word size, etc.), and I haven't figured out a way to do this yet.
Is there any way to analyze the files to determine these properties, with software or otherwise?
Cheers and thanks!
I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:
python C:\Python27\Scripts\hachoir-wx
It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.
For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:
HEAD_FLAGS Bit flags: 2 bytes
0x10 - information from previous files is used (solid flag)
bits 7 6 5 (for RAR 2.0 and later)
0 0 0 - dictionary size 64 KB
0 0 1 - dictionary size 128 KB
0 1 0 - dictionary size 256 KB
0 1 1 - dictionary size 512 KB
1 0 0 - dictionary size 1024 KB
1 0 1 - dictionary size 2048 KB
1 1 0 - dictionary size 4096 KB
1 1 1 - file is directory
Dictionary size can be found in the WinRAR GUI too.
METHOD Packing method 1 byte
0x30 - storing
0x31 - fastest compression
0x32 - fast compression
0x33 - normal compression
0x34 - good compression
0x35 - best compression
And Wikipedia also knows this:
The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.
For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:
general purpose bit flag: (2 bytes)
compression method: (2 bytes)