pythonamazon-web-servicesamazon-s3boto3

Boto3 to download all files from a S3 Bucket


I'm using boto3 to get files from s3 bucket. I need a similar functionality like aws s3 sync

My current code is

#!/usr/bin/python
import boto3
s3=boto3.client('s3')
list=s3.list_objects(Bucket='my_bucket_name')['Contents']
for key in list:
    s3.download_file('my_bucket_name', key['Key'], key['Key'])

This is working fine, as long as the bucket has only files. If a folder is present inside the bucket, its throwing an error

Traceback (most recent call last):
  File "./test", line 6, in <module>
    s3.download_file('my_bucket_name', key['Key'], key['Key'])
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 58, in download_file
    extra_args=ExtraArgs, callback=Callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 651, in download_file
    extra_args, callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 666, in _download_file
    self._get_object(bucket, key, filename, extra_args, callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 690, in _get_object
    extra_args, callback)
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 707, in _do_get_object
    with self._osutil.open(filename, 'wb') as f:
  File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 323, in open
    return open(filename, mode)
IOError: [Errno 2] No such file or directory: 'my_folder/.8Df54234'

Is this a proper way to download a complete s3 bucket using boto3. How to download folders.


Solution

  • When working with buckets that have 1000+ objects its necessary to implement a solution that uses the NextContinuationToken on sequential sets of, at most, 1000 keys. This solution first compiles a list of objects then iteratively creates the specified directories and downloads the existing objects.

    import boto3
    import os
    
    s3_client = boto3.client('s3')
    
    def download_dir(prefix, local, bucket, client=s3_client):
        """
        params:
        - prefix: pattern to match in s3
        - local: local path to folder in which to place files
        - bucket: s3 bucket with target contents
        - client: initialized s3 client object
        """
        keys = []
        dirs = []
        next_token = ''
        base_kwargs = {
            'Bucket':bucket,
            'Prefix':prefix,
        }
        while next_token is not None:
            kwargs = base_kwargs.copy()
            if next_token != '':
                kwargs.update({'ContinuationToken': next_token})
            results = client.list_objects_v2(**kwargs)
            contents = results.get('Contents')
            for i in contents:
                k = i.get('Key')
                if k[-1] != '/':
                    keys.append(k)
                else:
                    dirs.append(k)
            next_token = results.get('NextContinuationToken')
        for d in dirs:
            dest_pathname = os.path.join(local, d)
            if not os.path.exists(os.path.dirname(dest_pathname)):
                os.makedirs(os.path.dirname(dest_pathname))
        for k in keys:
            dest_pathname = os.path.join(local, k)
            if not os.path.exists(os.path.dirname(dest_pathname)):
                os.makedirs(os.path.dirname(dest_pathname))
            client.download_file(bucket, k, dest_pathname)