I have the next case:
Get a certain count of N objects from Minio and create zip archive and upload it zip to Minio as one object.
Problem:
I use miniopy-async for work with Minio.
May be have any ideas?
You’re right to be concerned about memory here. If you try to get_object
and just read()
it all into memory, you’ll blow past your 4 GB limit very quickly with 40 GB objects. The trick is to stream both the download from Minio and the writing into the zip file, instead of buffering everything.
A couple of ideas that should fit your case:
Use streaming API from minio-py / minio-async
With miniopy-async
, get_object
returns an async stream. Instead of doing .read()
, you can iterate over chunks (await res.stream()
) and directly feed those chunks into your zip writer.
Zip files without holding them fully in memory
Python’s built-in zipfile
module supports file-like objects, but unfortunately it expects seekable files. A common workaround is to write to a temporary file on disk instead of BytesIO
. Since you mentioned you have ~240 GB disk, you can safely buffer your zip archive there and then upload it back to Minio.
Something like:
import asyncio
import aiofiles
import tempfile
import zipfile
from miniopy_async import Minio
client = Minio("localhost:9000", access_key="xxx", secret_key="xxx", secure=False)
async def stream_to_zip(bucket, keys, output_bucket, output_key):
# create a temp file for the zip
with tempfile.NamedTemporaryFile(delete=False) as tmp:
zip_path = tmp.name
# write objects into the zip one by one
with zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_STORED) as zf:
for key in keys:
resp = await client.get_object(bucket, key)
with zf.open(key, "w") as dest:
async for chunk in resp.stream():
dest.write(chunk)
await resp.close()
# now upload the zip file back to minio
await client.fput_object(output_bucket, output_key, zip_path)
This way you never hold the full 40 GB object in RAM — only the chunk you’re currently processing. The zip is streamed to a temp file on disk, and when finished you upload that single file back to Minio.
Alternative: multipart upload
If you can’t afford to store the full zip on disk either (e.g. if N objects * 40 GB is bigger than 240 GB), then you’d need to do multipart upload directly to Minio while streaming into the zip writer. That’s more complex because zipfile
expects a seekable target, so you’d need a streaming zip implementation (there are third-party libraries like zipstream
that can do this).
So the practical, simplest approach: use a temp file on disk, write to it chunk by chunk, then push it back to Minio. With your 240 GB disk and 40 GB objects, you should be fine as long as you’re not zipping too many at once.