I download a zip file from AWS S3 and unzip it. Upon unzipping, all files are saved in the tmp/
folder.
s3 = boto3.client('s3')
s3.download_file('testunzipping','DataPump_10000838.zip','/tmp/DataPump_10000838.zip')
with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
zip_ref.extractall('/tmp/')
lstNEW = zip_ref.namelist()
The output of listNEW
is something like this:
['DataPump_10000838/', '__MACOSX/._DataPump_10000838', 'DataPump_10000838/DockBooking', '__MACOSX/DataPump_10000838/._DockBooking', 'DataPump_10000838/LoadEquipment', '__MACOSX/DataPump_10000838/._LoadEquipment', ....]
LoadEquipment and DockBooking are files but the rest are not. Is it possible to unzip the file without creating those temporary files? Or is I possible to filter out the real files? Because later, I need to use the correct files and gzip them.
$item_$unixepochtimestamp.csv.gz
Do I use the compress function?
To only extract certain files, you can pass a list to extractall
:
with zipfile.ZipFile('/tmp/DataPump_10000838.zip', 'r') as zip_ref:
lstNEW = list(filter(lambda x: not x.startswith("__MACOSX/"), zip_ref.namelist()))
zip_ref.extractall('/tmp/', members=lstNEW)
The files are not temporary files, but rather macOS's way of representing resource forks in zip files that don't normally support this.