[SOLVED] Compress file on S3

Compress file on S3

I have a 17.7GB file on S3. It was generated as the output of a Hive query, and it isn't compressed.

I know that by compressing it, it'll be about 2.2GB (gzip). How can I download this file locally as quickly as possible when transfer is the bottleneck (250kB/s).

I've not found any straightforward way to compress the file on S3, or enable compression on transfer in s3cmd, boto, or related tools.

Solution

S3 does not support stream compression nor is it possible to compress the uploaded file remotely.

If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

If you need this more frequently

Serving gzipped CSS and JavaScript from Amazon CloudFront via S3