pythonamazon-web-servicesamazon-s3boto3minio

Very slow list_objects_v2 in minio


I'm using latest minio in docker on two servers. One is in my local network, another one is on a remote server. Configuration is default, volume is located on an HDD, not SSD.

Bucket contains around 400k objects, ~70 GB.

I have the following code (with the latest boto3):

        paginator = s3_client.get_paginator('list_objects_v2')
        page_iterator = paginator.paginate(Bucket=bucket_name)

        for page in page_iterator:
            keys = [obj['Key'] for obj in page.get('Contents', [])]
            ...

And every iteration is very slow in most (but not 100%) cases on both servers, iterating over 400k objects can take hours. Sometimes it works fine.

On the other hand, PUT/HEAD are very fast, so I don't think it's a disk problem (especially on two different servers with different hardware). CPU/RAM load is low. There is absolutely no other parallel requests to minio (these are internal servers and they are not used yet).

How to speed it up? Maybe I'm doing something wrong and there is a better way to get all keys in bucket?


Solution

  • because minio will translate list_objects_v2 to ListObject* then call readdirs + stat on fs. You should avoid listObject on minio.

    Maybe use a DB to keep a file key list.

    https://github.com/minio/minio/issues/17472#issuecomment-1598408259