pythonamazon-web-servicesamazon-s3boto3rclone

Cross-Region S3 Copy with SourceClient Fails in Boto3 and AWS CLI on Scaleway


TL;DR

I have issues using boto3/aws cli to copy files between buckets that are in a different region (Note: I am using Scaleway as my cloud provider, not AWS). I could not get it to work using boto3, but managed to find a solution using rclone. I would like to know whether boto3 is still a possibility to limit the number of dependencies in my stack.

Description

When performing a cross-region S3 copy operation using Boto3 (or the AWS CLI), the SourceClient parameter in Boto3 and the --endpoint-url parameter in the AWS CLI are not applied consistently. This results in errors when attempting to copy objects from a source bucket in one region to a destination bucket in another region without downloading the objects locally.

Expected Behavior: The object should copy successfully from the source bucket to the destination bucket across regions, using the SourceClient to correctly resolve the source bucket's region.

Actual Behaviour: an error is raised.

botocore.exceptions.ClientError: An error occurred (NoSuchBucket) when calling the CopyObject operation: The specified bucket does not exist

The copy command does not use information from the SourceClient input, and only uses the info (credentials, location, etc.) from the client on which the copy method was called.

I also tried this with the aws cli, but got the same results:

aws s3 sync s3://source-bucket s3://dest-bucket \
    --source-region fr-par \
    --region nl-ams \
    --endpoint-url https://s3.fr-par.scw.cloud \
    --profile mys3profile

The aws cli seems to fall back on an amazonaws endpoint:

fatal error: Could not connect to the endpoint URL: "https://source-bucket.s3.fr-par.amazonaws.com/?list-type=2&prefix=&encoding-type=url"

Reproduction Steps:

import boto3
from dotenv import dotenv_values
config = dotenv_values(".env")

# Initialize source and destination clients
s3_session = boto3.Session(
    aws_access_key_id=config.get("SCW_ACCESS_KEY"),
    aws_secret_access_key=config.get("SCW_SECRET_KEY"),
    region_name="fr-par",
)
src_s3 = s3_session.client(
    service_name="s3",
    region_name="fr-par",
    endpoint_url="https://s3.fr-par.scw.cloud",
)
s3_session = boto3.Session(
    aws_access_key_id=config.get("SCW_ACCESS_KEY"),
    aws_secret_access_key=config.get("SCW_SECRET_KEY"),
    region_name="nl-ams",
)
dest_s3 = s3_session.client(
    service_name="s3",
    region_name="nl-ams",
    endpoint_url="https://s3.nl-ams.scw.cloud",
)

# Set up source and destination parameters
copy_source = {
    "Bucket": "source_bucket_name",
    "Key": "source_object_name",
}

# Attempt to copy with SourceClient
dest_s3.copy(
    copy_source,
    "destination_bucket_name",
    source_object_name,
    SourceClient=src_s3
)

Possible Solution

I could not get it to work using boto3, but I managed to get a solution that was acceptable to me using rclone.

Example config to be placed in ~.conf/rclone/rclone.conf:

[scw_s3_fr]
type = s3
provider = Scaleway
access_key_id = ...
secret_access_key = ...
region = fr-par
endpoint = s3.fr-par.scw.cloud
acl = private

[scw_s3_nl]
type = s3
provider = Scaleway
access_key_id = ...
secret_access_key = ...
region = nl-ams
endpoint = s3.nl-ams.scw.cloud
acl = private

sync the source to the destination one-way:

rclone sync scw_s3_fr:source-bucket scw_s3_nl:destination-bucket -P --metadata --checksum --check-first

the actual question

Does anybody know what I did wrong here? Or could guide me in the right direction to get the configuration setup right. My short-term needs are currently all set, but I wonder if a pure-boto3 solution is still possible.

Environment details

Python 3.11.2 (main, Mar 7 2023, 16:53:12) [GCC 12.2.1 20230201] on linux boto3='1.35.66'


Solution

  • The issue you’re facing with Boto3 when trying to copy files between buckets in different regions on Scaleway arises because the copy method in Boto3 isn’t fully compatible with non-AWS S3 implementations. Specifically, the SourceClient parameter doesn’t properly resolve the endpoint for the source bucket when working with Scaleway, that's why you end up with errors like NoSuchBucket.

    in order to make Boto3 work for cross-region copies in this context, you have to copy and put the objects by yourself, rather than using the copy:

    import boto3
    from dotenv import dotenv_values
    
    config = dotenv_values(".env")
    
    src_s3 = boto3.client(
        service_name="s3",
        region_name="fr-par",
        endpoint_url="https://s3.fr-par.scw.cloud",
        aws_access_key_id=config.get("SCW_ACCESS_KEY"),
        aws_secret_access_key=config.get("SCW_SECRET_KEY"),
    )
    
    dest_s3 = boto3.client(
        service_name="s3",
        region_name="nl-ams",
        endpoint_url="https://s3.nl-ams.scw.cloud",
        aws_access_key_id=config.get("SCW_ACCESS_KEY"),
        aws_secret_access_key=config.get("SCW_SECRET_KEY"),
    )
    
    source_bucket = "source_bucket_name"
    destination_bucket = "destination_bucket_name"
    object_key = "source_object_name"
    
    response = src_s3.get_object(Bucket=source_bucket, Key=object_key)
    object_data = response["Body"].read()
    
    #Upload after Download
    dest_s3.put_object(Bucket=destination_bucket, Key=object_key, Body=object_data)
    

    The AWS copy failed because The SourceClient parameter is designed for AWS-specific cross-region copying but doesn’t work for non-AWS S3 providers like Scaleway due to strict endpoint URL resolution and its reliance on AWS-specific behaviors.