amazon-web-servicesamazon-s3amazon-glacierdisaster-recovery

Disaster Recovery for S3 Bucket with a lot Parquet Files


I have an S3 bucket with a lot of parquet split files within each partition. All files in the bucket are highly important to business and in case anyone deletes them that will be a disaster. Now, if i use Glacier/Glacier D.Arch., I'm concerned that my retrieval costs in case of failure will be too high because of number of individual parquet files. How can I best create Disaster Recovery for such a bucket with least cost? (Assuming users are not deleting necessary data every month ofc.)

Example Case: Consider this, I have 100 GB of data, full of 150 KB files. Annual additional cost of 1 accidental delete in Glacier is 53 USD, while on Glacier Deep Arch its 82.4 USD. Now simply change the per file size from 150 KB to 1024 KB. These costs change to 21 USD for Glacier and 16 USD for Glacier Deep Arch. My main problem number of parquet files here that raise the cost of affordable retrieval.


Solution

  • If you just want to prevent someone from deleting objects accidentally, I don't think S3 Glacier or Glacier Deep Archive is the right way to go. Instead, you can achieve this by enabling object versioning and MFA delete.

    Also keep in mind, Amazon S3 Glacier and S3 Glacier Deep Archive have an overhead of additional 32KB per object. Considering your objects are 150 KB on average, this would lead to a 21.3% cost increase.