pythonamazon-web-servicesamazon-s3

how to list files from a S3 bucket folder using python


I tried to list all files in a bucket. Here is my code

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

it works. I get all files' names. However, when I tried to do the same thing on a folder, the code raise an error

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_project/data/') # add the folder name

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Here is the error:

botocore.exceptions.ParamValidationError: Parameter validation failed:

Invalid bucket name "carlos-cryptocurrency-research-project/data/": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

I'm sure the folder name is correct and I tried replacing it with Amazon Resource Name (ARN) and S3 URI, but still get the error.


Solution

  • You can't indicate a prefix/folder in the Bucket constructor. Instead use the client-level API and call list_objects_v2 something like this:

    import boto3
    
    client = boto3.client('s3')
    
    response = client.list_objects_v2(
        Bucket='my_bucket',
        Prefix='data/')
    
    for content in response.get('Contents', []):
        print(content['Key'])
    

    Note that this will yield at most 1000 S3 objects. You can use a paginator if needed, or consider using the higher-level Bucket resource and its objects collection which handles pagination for you, per another answer to this question.