pythonziptime-estimation

Estimating zip size/creation time


I need to create ZIP archives on demand, using either Python zipfile module or unix command line utilities.

Resources to be zipped are often > 1GB and not necessarily compression-friendly.

How do I efficiently estimate its creation time / size?


Solution

  • Extract a bunch of small parts from the big file. Maybe 64 chunks of 64k each. Randomly selected.

    Concatenate the data, compress it, measure the time and the compression ratio. Since you've randomly selected parts of the file chances are that you have compressed a representative subset of the data.

    Now all you have to do is to estimate the time for the whole file based on the time of your test-data.