pythonzip

How to zip folders with subfolders containing .yml files using Python


I have a folder "a" which has subfolders "b" with subfolders "c" that contain .yml files. I want to zip all the .yml files in the subfolders "b" or "c" without the nested folder structure, so the .zip would contain all the .yml files. Below is the Python code I have so far. It is not zipping all the .yml files within the subfolders.

import os
from zipfile import ZipFile

# # Adding files that need to be zipped
directory_to_zip = r'filepath'

# Create a ZipFile Object
with ZipFile(r'filepath\helper.zip', 'w') as zip_object:
    # Traverse the directory and add YAML files to the zip
    for root, dirs, files in os.walk(directory_to_zip):
        for file in files:
            if file.endswith('.yml'):
                # Create the complete filepath
                file_path = os.path.join(root, file)
                # Write the file to the zip file with just the filename, ignoring directories
                zip_object.write(file_path, file)

# Check to see if the zip file is created
if os.path.exists(r'filepath\helper.zip'):
    print("ZIP file created")
else:
    print("ZIP file not created")

Solution

  • When you add files to the zip archive using zip_object.write(file_path, file) it stores the files directly into the root of the zip archive. If multiple files have the same name config.yml in different subfolders they will overwrite each other.

    # Path to the directory you want to zip
    directory_to_zip = r'filepath\a'
    
    # Path where you want to save the zip file
    zip_file_path = r'filepath\helper.zip'
    
    # Create a ZipFile Object
    with ZipFile(zip_file_path, 'w') as zip_object:
        # Traverse the directory and add YAML files to the zip
        for root, dirs, files in os.walk(directory_to_zip):
            for file in files:
                if file.endswith('.yml'):
                    # Create the complete filepath
                    file_path = os.path.join(root, file)
                    
                    # Option 1: If you want to keep the filenames unique by adding the parent directory's name
                    archive_name = os.path.join(os.path.basename(root), file)
                    
                    # Option 2: If you're okay with just the filenames and are sure there are no conflicts
                    # archive_name = file
    
                    # Write the file to the zip file with the adjusted filename
                    zip_object.write(file_path, archive_name)
    
    # Check to see if the zip file is created
    if os.path.exists(zip_file_path):
        print("ZIP file created")
    else:
        print("ZIP file not created")
    

    This should stop potential conflicts in file names including parent directory name. The code should also ensure that your .yml files are included in zip file without nesting them.

    EDIT: There has been some incorrect information posted on this answer. The correct information has been outlined by Mark Adler who clarified that "You are incorrect. Files with exactly the same name can be added to the zip file. Python will issue a warning, but will proceed to add the file anyway".