pythonmacoszip

Remove auto-generated __MACOSX folder from inside a zip file in Python


I have zip files uploaded by clients through a web server that sometimes contain pesky __MACOSX directories inside that gum things up. How can I remove these?

I thought of using ZipFile, but this answer says that isn't possible and gives this suggestion:

Read out the rest of the archive and write it to a new zip file.

How can I do this with ZipFile? Another Python based alternative like shutil or something similar would also be fine.


Solution

  • The examples below are designed to determine if a __MACOSX file is contained within a zip file. If it is, then a new zip archive is created and all the files that are not __MACOSX files are written to this new archive. This code can be extended to include .ds_store files.

    Example One

    from zipfile import ZipFile
    
    original_zip = ZipFile ('original.zip', 'r')
    new_zip = ZipFile ('new_archve.zip', 'w')
    for item in original_zip.infolist():
       buffer = original_zip.read(item.filename)
       if not str(item.filename).startswith('__MACOSX/'):
         new_zip.writestr(item, buffer)
      new_zip.close()
    original_zip.close()
    

    Example Two

    def check_archive_for_bad_filename(file):
      zip_file = ZipFile(file, 'r')
      for filename in zip_file.namelist():
         print(filename)
         if filename.startswith('__MACOSX/'):
            return True
    
    def remove_bad_filename_from_archive(original_file, temporary_file):
       zip_file = ZipFile(original_file, 'r')
       for item in zip_file.namelist():
          buffer = zip_file.read(item)
          if not item.startswith('__MACOSX/'):
            if not os.path.exists(temporary_file):
               new_zip = ZipFile(temporary_file, 'w')
               new_zip.writestr(item, buffer)
               new_zip.close()
             else:
               append_zip = ZipFile(temporary_file, 'a')
               append_zip.writestr(item, buffer)
               append_zip.close()
    
        zip_file.close()
    
    
    archive_filename = 'old.zip'
    temp_filename = 'new.zip'
    
    results = check_archive_for_bad_filename(archive_filename)
    if results:
       print('Removing MACOSX file from archive.')
       remove_bad_filename_from_archive(archive_filename, temp_filename)
    else:
       print('No MACOSX file in archive.')