I am using a loop to read multi-archive 7z files with this code.
import py7zr
import multivolumefile
zip_path = f"{ARCHIVE_PATH}/test.7z"
with multivolumefile.open(zip_path, mode='rb') as multizip_handler:
with py7zr.SevenZipFile(multizip_handler, 'r', password=PASSWORD, filters=filters) as zip_handler:
for fname, fcontent in zip_handler.read(targets=None).items():
pass
The archive is relatively large (73 parts with a total size of 700 Mb). I have noticed that the memory footprint is quite high (even without storing in memory any variable content like fname
or fcontent
). This loop is working, but if I intentionnaly fill the memory with commands such as head -c 7G /dev/zero | tail
, the loop is giving me a CRC Error (while actually the archive is fine tested with the 7z
command). The loop is quite simple and use only library functions, so I cannot make it lighter than it is.
EDIT: to be more precise:
So my guess is that one of the two libraries multivolumefile
or py7zr
is internally consuming a lot of memory.
Is there a way to reduce the memory footprint so we can ensure reading a multipart archive always success independently from the size of the archive or the size of the files inside the archive?
After many tests, this looks probably a bug, a bug report has been submited: https://github.com/miurahr/py7zr/issues/575