amazon-web-servicesamazon-s3backblaze

Is it possible to use the command line to transfer data from Backblaze B2 to Amazon S3 without downloading to a local directory?


I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.

I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.

Thank you


Solution

  • Yes, you can do this using the AWS CLI. The aws s3 cp command can read stdin or write to stdout by using - instead of a filename, so you can pipe two aws s3 cp commands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.

    First, configure two AWS profiles from the command line - one for B2 and the other for AWS. aws configure will prompt you for the credentials for each account:

    % aws configure --profile b2
    % aws configure --profile aws
    

    After you run aws configure, edit the AWS config file (~/.aws/config on Mac and Linux, C:\Users\USERNAME\.aws\config on Windows) and add a value for endpoint_url to the b2 profile. This saves you from having to specify the --endpoint-url option every time you run aws s3 with the b2 profile.

    For example, if your B2 region was us-west-004 and your AWS region was us-west-1, you would edit your config file to look like this:

    [profile b2]
    region = us-west-004
    endpoint_url = https://s3.us-west-004.backblazeb2.com
    
    [profile aws]
    region = us-west-1
    

    Now you can specify the profiles in the two aws s3 cp commands.

    aws --profile b2 s3 cp s3://<Your Backblaze bucket name>/filename.ext - \
    | aws --profile aws s3 cp - s3://<Your AWS bucket name>/filename.ext
    

    It's easy to run a quick test on a single file

    # Write a file to Backblaze B2
    % echo 'Hello world!' | \
    aws --profile b2 s3 cp - s3://metadaddy-b2/hello.txt
    
    # Copy file from Backblaze B2 to Amazon S3
    % aws --profile b2 s3 cp s3://metadaddy-b2/hello.txt - \
    | aws --profile aws s3 cp - s3://metadaddy-s3/hello.txt
    
    # Read the file from Amazon S3
    % aws --profile aws s3 cp s3://metadaddy-s3/hello.txt -
    Hello world!
    

    One wrinkle is that, if the file is more than 50 GB, you will need to use the --expected-size argument to specify the file size so that the cp command can split the stream into parts for a large file upload. From the AWS CLI docs:

    --expected-size (string) This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 50GB. Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload.

    Here's a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you've set up the profiles as above.

    aws --profile b2 s3api list-objects-v2 --bucket metadaddy-b2 \
    | jq '.Contents[] | .Key, .Size' \
    | xargs -n2 sh -c 'echo "Copying \"$1\" ($2 bytes)"; \
        aws --profile b2 s3 cp "s3://metadaddy-b2/$1" - \
        | aws s3 --profile aws cp - "s3://metadaddy-s3/$1" --expected-size $2' sh
    

    Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.