pythonamazon-s3aws-lambdagzippython-s3fs

use boto for gzipping files instead of sfs3


import contextlib
import gzip

import s3fs

AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly

source_file_path = "/tmp/your_file.txt"
s3_file_path = "my-bucket/your_file.txt.gz"

with contextlib.ExitStack() as stack:
    source_file = stack.enter_context(open(source_file_path , mode="rb"))
    destination_file = stack.enter_context(AWS_S3.open(s3_file_path, mode="wb"))
    destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file))
    while True:
        chunk = source_file.read(1024)
        if not chunk:
            break
        destination_file_gz.write(chunk)

I was trying to run something like this on an AWS Lambda function but it throws an error because It Is unable to install the s3fs module. Plus, I am using boto for the remaining parts of my code so I would like to stick to boto. How I can use boto for this too?

Basically, I am opening/reading a file from a '/tmp/path', gzipping it and then saving to an S3 bucket

Edit:

s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket('testunzipping')
s3_filename = 'samplefile.csv.'
      
   for i in testList:
        #zip_ref.open(i, ‘r’)
        with contextlib.ExitStack() as stack:
            source_file = stack.enter_context(open(i , mode="rb"))
            destination_file = io.BytesIO()
            destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file, mode='wb'))
            while True:
                chunk = source_file.read(1024)
                if not chunk:
                    break
                destination_file_gz.write(chunk)
            destination_file.seek(0)
            
            fileName = i.replace("/tmp/DataPump_10000838/", "") 
            bucket.upload_fileobj(destination_file, fileName)

Each item in the testList looks like this "/tmp/your_file.txt"


Solution

  • AWS Lambda function but it throws an error because It Is unable to install the s3fs module

    Additional packages and your own lib code (reusable code) should be put in lambda layers.

    How I can use boto for this too?

    s3 = boto3.resource("s3")
    bucket = s3.Bucket(bucket_name)
    

    Then either:

    If you have your file in memory (file-like object, open in bytes mode, e.g. io.BytesIO or just open(..., 'b'))

    bucket.upload_fileobj(fileobj, s3_filename)
    

    Or if you have a file in your current space:

    bucket.upload_file(filepath, s3_filename)
    

    https://boto3.amazonaws.com/v1/documentation/api/1.18.53/reference/services/s3.html#S3.Bucket.upload_file