I am trying to get files inside zipfile and renaming them from 1 to n when extarcting. But when the actual filenames also start from 1 to n ZipFile.infolist() return wrong ordering.
This is how I trying to get result:
with ZipFile(file) as zf:
for i, file in enumerate(zf.infolist(), 1):
file.filename = f'{i}.{file.filename.split(".")[-1]}'
zf.extract(file, file_path)
And this is how the actual files ordering look like:
When I debug to code process ZipFile.infolist() return a list containing a ZipInfo objects like this:
As you can see from the images, actual ordering is like 1,2,3,4,5,6,7,8...n. But ZipFile.infolist() return it like 1,10,11,11,12,13...n
Am I doing it wrong? Or is there any workaround? I think in the worst case I should name actual file names in zipfile from 01, 02, 03, 04 to n. But this is unreliable solution.
The program you are using to display the contents of your zip file is sorting the filenames numerically before it displays them to you. The extraction with python is done in the order they are stored in the zip file.
Here is a worked example that shows the issue.
First create some files
$ touch 1.jpg 2.jpg 3.jpg 4.jpg 7.jpg 8.jpg 10.jpg 21.jpg 100.jp
g 205.jpg
Add them to a zip file in a random order
$ zip test.zip 10.jpg 2.jpg 1.jpg 100.jpg 21.jpg 8.jpg 205.jpg 7.
jpg 3.jpg
adding: 10.jpg (stored 0%)
adding: 2.jpg (stored 0%)
adding: 1.jpg (stored 0%)
adding: 100.jpg (stored 0%)
adding: 21.jpg (stored 0%)
adding: 8.jpg (stored 0%)
adding: 205.jpg (stored 0%)
adding: 7.jpg (stored 0%)
adding: 3.jpg (stored 0%)```
Check what unzip
thinks is in the file
$ unzip -l test.zip
Archive: test.zip
Length Date Time Name
--------- ---------- ----- ----
0 2023-10-01 15:44 10.jpg
0 2023-10-01 15:44 2.jpg
0 2023-10-01 15:44 1.jpg
0 2023-10-01 15:44 100.jpg
0 2023-10-01 15:44 21.jpg
0 2023-10-01 15:44 8.jpg
0 2023-10-01 15:44 205.jpg
0 2023-10-01 15:44 7.jpg
0 2023-10-01 15:44 3.jpg
--------- -------
0 9 files
It displays them in the order they were added.
Now print contents with python
import zipfile
zipfilename = "test.zip"
with zipfile.ZipFile(zipfilename) as zf:
for file in zf.namelist():
print(file)
the code outputs this
$ python try.py
10.jpg
2.jpg
1.jpg
100.jpg
21.jpg
8.jpg
205.jpg
7.jpg
3.jpg
That matches the file insertion order
You can work around this by sorting the contents of the zip file yourself. The key points about your files are
jpg
, in the sort.The python code can use the sorted
function to sort the file by filename
import zipfile
from pathlib import Path
print()
zipfilename = "test1.zip"
with zipfile.ZipFile(zipfilename) as zf:
for file in sorted(zf.namelist(), key=lambda x: int(Path(x).stem)):
print(file)
this outputs the files in order
$ python try.py
1.jpg
2.jpg
3.jpg
7.jpg
8.jpg
10.jpg
21.jpg
100.jpg
205.jpg
Let's unpick the line with the sorted
function
for file in sorted(zf.namelist(), key=lambda x: int(Path(x).stem)):
The sorted
function is given two parameters:
the list of filenames to sort via zf.namelist()
a function that works out the key to be used in sorting, lambda x: int(Path(x).stem)
.
The call to Path(x).stem
takes full filename (e.g. 205.jpg
) and returns the filename without the extension (e.g. 205
).
The int
converts the string 205
into the integer value 205
. That value is then returned to sorted
and allows the filenames to be sorted numerically.