amazon-s3compressiongzipcontent-typecontent-encoding

How do I download compressed content on a S3 drive as plain text?


I have HTML data stored on a S3 bucket served as a static site which is stored in gzipped form. Though I can access my S3 site correctly and the browser knows to uncompress it, I can't download it using AWS CLI because the raw data that gets downloaded remains gzipped (rather than decompressing after the download) even after being copied to disk and thus comes out garbled when opened via text editor or browser.

I've tried to explicitly pass the content-encoding to convert the gzipped content on S3 to plain text but the file that gets downloaded still appears to have the gzipped bytes rather than the raw UTF-8. Here is the command I've tried:

aws s3 cp s3://mys3bucket.com/index.html ./test.html --content-encoding "gzip" --content-type "text/html"

Solution

  • After downloading (or while) you can uncompress the data yourself. If you are using a Unix variant, this will be done by piping the output into zcat like this:

    aws s3 cp s3://mys3bucket.com/index.html ./test.html --content-encoding "gzip" --content-type "text/html" | zcat
    

    You can also store the data in a file and later uncompress it.

    It would be nonsense to uncompress it on the S3 side because then you'd have to transmit way more data (the uncompressed version).