pythonzipunzip7zipmultipartfile

How to extract a mult-part zip file in python?


Suposse that I have some files that I downloaded from a server and they are zipped with 7zip in multiple parts, the format is something like this myfile.zip.001, myfile.zip.002, ..., myfile.zip.00n. Basically, I need to extract the content of it in the same folder where they are stored.

I tried using zipfile, patoolib and pyunpack without success, here is what I've done:

file_path = r"C:\Users\user\Documents\myfile.zip.001" #I also tested with only .zip
extract_path = r"C:\Users\user\Documents\"

#"

import zipfile
with zipfile.ZipFile(file_path, "r") as zip_ref:
  zip_ref.extractall(extract_path) # myfile.zip.001 file isn't zip file.

from pyunpack import Archive
Archive(file_path).extractall(extract_path) # File is not a zip file

import patoolib
patoolib.extract_archive(file_path, outdir=extract_path) # unknown archive format for file `myfile.zip.001'

Another way (that works, but it's very ugly) is this one:

import os
import subprocess

path_7zip = r"C:\Program Files (x86)\7-Zip\7z.exe"

cmd = [path_7zip, 'x', 'myfile.zip.001']
sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)

But this makes the user install 7zip in his computer, which isn't a good approach of what I'm looking for.

So, the question is: there is at least a way to extract/unzip multi-parts files with the format x.zip.001 in python?


Solution

  • You seem to be on the right track with zipfile, but you most likely have to concatenate the zip file before using extractall.

    import os
    
    zip_prefix = "myfile.zip."
    
    # N number of parts
    import glob
    
    parts = glob.glob(zip_prefix + '*')
    n = len(parts)
    
    # Concatenate
    with open("myfile.zip", "wb") as outfile:
        for i in range(1, n+1):
            filename = zip_prefix + str(i).zfill(3)
            with open(filename, "rb") as infile:
                outfile.write(infile.read())
    
    # Extract
    import zipfile
    
    with zipfile.ZipFile(file_path, "r") as zip_ref:
      zip_ref.extractall(extract_path)