I would like to move files from Backblaze B2 to Amazon S3. The instructions here say that I should download them to a local directory. However, I am trying to transfer about 180 TB of data so I would prefer to not have to download them locally.
I found this post with a similar question, but I was wondering if there was a way to do this using the command line instead of ForkLift.
Thank you
Yes, you can do this using the AWS CLI. The aws s3 cp
command can read stdin
or write to stdout
by using -
instead of a filename, so you can pipe two aws s3 cp
commands together to read a file from Backblaze B2 and write it to Amazon S3 without it hitting the local disk.
First, configure two AWS profiles from the command line - one for B2 and the other for AWS. aws configure
will prompt you for the credentials for each account:
% aws configure --profile b2
% aws configure --profile aws
After you run aws configure
, edit the AWS config file (~/.aws/config
on Mac and Linux, C:\Users\USERNAME\.aws\config
on Windows) and add a value for endpoint_url
to the b2
profile. This saves you from having to specify the --endpoint-url
option every time you run aws s3
with the b2
profile.
For example, if your B2 region was us-west-004
and your AWS region was us-west-1
, you would edit your config file to look like this:
[profile b2]
region = us-west-004
endpoint_url = https://s3.us-west-004.backblazeb2.com
[profile aws]
region = us-west-1
Now you can specify the profiles in the two aws s3 cp
commands.
aws --profile b2 s3 cp s3://<Your Backblaze bucket name>/filename.ext - \
| aws --profile aws s3 cp - s3://<Your AWS bucket name>/filename.ext
It's easy to run a quick test on a single file
# Write a file to Backblaze B2
% echo 'Hello world!' | \
aws --profile b2 s3 cp - s3://metadaddy-b2/hello.txt
# Copy file from Backblaze B2 to Amazon S3
% aws --profile b2 s3 cp s3://metadaddy-b2/hello.txt - \
| aws --profile aws s3 cp - s3://metadaddy-s3/hello.txt
# Read the file from Amazon S3
% aws --profile aws s3 cp s3://metadaddy-s3/hello.txt -
Hello world!
One wrinkle is that, if the file is more than 50 GB, you will need to use the --expected-size
argument to specify the file size so that the cp
command can split the stream into parts for a large file upload. From the AWS CLI docs:
--expected-size
(string) This argument specifies the expected size of a stream in terms of bytes. Note that this argument is needed only when a stream is being uploaded to s3 and the size is larger than 50GB. Failure to include this argument under these conditions may result in a failed upload due to too many parts in upload.
Here's a one-liner that copies the contents of a bucket on B2 to a bucket on S3, outputting the filename (object key) and size of each file. It assumes you've set up the profiles as above.
aws --profile b2 s3api list-objects-v2 --bucket metadaddy-b2 \
| jq '.Contents[] | .Key, .Size' \
| xargs -n2 sh -c 'echo "Copying \"$1\" ($2 bytes)"; \
aws --profile b2 s3 cp "s3://metadaddy-b2/$1" - \
| aws s3 --profile aws cp - "s3://metadaddy-s3/$1" --expected-size $2' sh
Although this technique does not hit the local disk, the data still has to flow from B2 to wherever this script is running, then to S3. As @Mark B mentioned in his answer, run the script on an EC2 instance for best performance.