pythonzip

Read all files in .zip archive in python


I'm trying to read all files in a .zip archive named data1.zip using the glob() method.

import glob
from zipfile import ZipFile

archive = ZipFile('data1.zip','r')
files = archive.read(glob.glob('*.jpg'))

Error Message:

TypeError: unhashable type: 'list'

The solution to the problem I'm using is:

files = [archive.read(str(i+1)+'.jpg') for i in range(100)]

This is bad because I'm assuming my files are named 1.jpg, 2.jpg, etc.

Is there a better way using python best practices to do this? Doesn't need to be necessarily using glob()


Solution

  • glob doesn't look inside your archive, it'll just give you a list of jpg files in your current working directory.

    ZipFile already has methods for returning information about the files in the archive: namelist returns names, and infolist returns ZipInfo objects which include metadata as well.

    Are you just looking for:

    archive = ZipFile('data1.zip', 'r')
    files = archive.namelist()
    

    Or if you only want .jpg files:

    files = [name for name in archive.namelist() if name.endswith('.jpg')]
    

    Or if you want to read all the contents of each file:

    files = [archive.read(name) for name in archive.namelist()]
    

    Although I'd probably rather make a dict mapping names to contents:

    files = {name: archive.read(name) for name in archive.namelist()}
    

    That way you can access contents like so:

    files['1.jpg']
    

    Or get a list of the files presents using files.keys(), etc.