Yes, I have read the other posts on this subject, but I am running into a weird problem:
When I extract a certain item from the namelist
, it only gives me an empty folder, not the actual files inside.
My zip file has the following hierarchy:
myzip.zip -> FolderA -> FolderB -> FolderC -> FolderIWantA, FolderIWantB, ... FolderIWantN.
So there are a lot of preceeding folders I do not wish to extract. I know how to identify the ones I want from the namelist:
import os
import sys
import zipfile
try:
zip_file_path = sys.argv[1]
except IndexError:
sys.exit('No zip file provided.')
archive = zipfile.ZipFile(zip_file_path)
for i,file in enumerate(archive.namelist()):
if os.path.basename(file[:-1]).startswith('ABC-'): # identify relevant folders
old_name = os.path.basename(file[:-1])
new_name = 'new_%d'%i # Create a new name
archive.extract(file, new_name)
This does extract the folders I want, however the extracted folders are empty for some reason. And not just that: When I extract the new folders, they contain the preceeding folders A,B and C for some reason.
I do not know why it does that... Here's a test zip for your convenience:
import os
import shutil
prefolders = r'testzip\FolderA\FolderB\FolderC'
try:
os.makedirs(prefolders)
except FileExistsError:
pass
for i in 'ABC':
try:
new_folder = 'ABC-Folder%s'%i
os.mkdir(os.path.join(prefolders,new_folder))
except FileExistsError:
pass
for j in range(2):
file_path = os.path.join(prefolders,new_folder,'somefile%s.txt'%j)
with open(file_path,'w'): pass
shutil.make_archive('testzip', 'zip', 'testzip')
shutil.rmtree('testzip')
I thought this would take like 10 minutes and I am losing my mind over this...
You're looking for the basename()
to start with ABC-
, which means you never find files that don't start with that. The files in your example start with somefile
. extract()
will only extract the one thing that is named. In your case, all of the things that start with ABC-
are directories.
To find the files that have a directory somewhere in their path that starts with ABC-
, you could:
if os.path.basename(file) != '' and ('/ABC-' in os.path.dirname(file) or os.path.dirname(file).startswith('ABC-')):
(You may need to change the slash to a backslash on your system.)
This will still extract the file and all of the parent directories as named in file
. If you want just the file by itself in new_n
, then you will need to use read()
on the entry, and then write the data to the desired destination file.