amazon-web-servicesamazon-s3aws-batchaws-datasync

Efficient way to Copy/Replicate S3 Objects?


I need to replicate Millions (one time) of S3 Objs by modifying the metadata (within the same bucket, and obj path)

To perform this, we've various options mentioned below, we need to choose cost-effective method:

  1. AWS COPY requests
  2. AWS Batch Operations
  3. AWS DataSync

References: https://repost.aws/knowledge-center/s3-large-transfer-between-buckets

I've read AWS Docs but could not get which one is better in terms of cost.


Solution

  • To update metadata on an Amazon S3 object, it is necessary to COPY the object to itself while specifying the new metadata.

    From Copying objects - Amazon Simple Storage Service:

    Each Amazon S3 object has metadata. It is a set of name-value pairs. You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata. In the copy operation, set the same object as the source and target.

    However, you have a choice as to how to trigger the COPY operation:

    Given that you have millions of objects, I would recommend using S3 Batch Operations since it can perform the process with massive scale.

    I would recommend this process:

    I suggest that you try the S3 Batch Operations step on a subset of objects (eg 10 objects) first to confirm that it operates the way you expect. This will be relatively fast and will avoid any potential errors.

    Note that S3 Batch Operations charges $1.00 per million object operations performed.