pythonzippython-zipfile

How to extract a subdir with all it's subsequent files using zipfile


Yes, I have read the other posts on this subject, but I am running into a weird problem:

When I extract a certain item from the namelist, it only gives me an empty folder, not the actual files inside.

My zip file has the following hierarchy:

myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN.

So there are a lot of preceeding folders I do not wish to extract. I know how to identify the ones I want from the namelist:

import os
import sys
import zipfile

try:
    zip_file_path = sys.argv[1]
except IndexError:
    sys.exit('No zip file provided.')

archive = zipfile.ZipFile(zip_file_path)

for i,file in enumerate(archive.namelist()):
    if os.path.basename(file[:-1]).startswith('ABC-'): # identify relevant folders
        old_name = os.path.basename(file[:-1])
        new_name = 'new_%d'%i # Create a new name
        
        archive.extract(file, new_name)

This does extract the folders I want, however the extracted folders are empty for some reason. And not just that: When I extract the new folders, they contain the preceeding folders A,B and C for some reason.

I do not know why it does that... Here's a test zip for your convenience:

import os
import shutil

prefolders = r'testzip\FolderA\FolderB\FolderC'

try:
    os.makedirs(prefolders)
except FileExistsError:
    pass


for i in 'ABC':
    try:
        new_folder = 'ABC-Folder%s'%i
        os.mkdir(os.path.join(prefolders,new_folder))
    except FileExistsError:
        pass

    for j in range(2):
        file_path = os.path.join(prefolders,new_folder,'somefile%s.txt'%j)
        with open(file_path,'w'): pass

shutil.make_archive('testzip', 'zip', 'testzip')
shutil.rmtree('testzip')

I thought this would take like 10 minutes and I am losing my mind over this...


Solution

  • You're looking for the basename() to start with ABC-, which means you never find files that don't start with that. The files in your example start with somefile. extract() will only extract the one thing that is named. In your case, all of the things that start with ABC- are directories.

    To find the files that have a directory somewhere in their path that starts with ABC-, you could:

        if os.path.basename(file) != '' and ('/ABC-' in os.path.dirname(file) or os.path.dirname(file).startswith('ABC-')):
    

    (You may need to change the slash to a backslash on your system.)

    This will still extract the file and all of the parent directories as named in file. If you want just the file by itself in new_n, then you will need to use read() on the entry, and then write the data to the desired destination file.