import contextlib
import gzip
import s3fs
AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly
source_file_path = "/tmp/your_file.txt"
s3_file_path = "my-bucket/your_file.txt.gz"
with contextlib.ExitStack() as stack:
source_file = stack.enter_context(open(source_file_path , mode="rb"))
destination_file = stack.enter_context(AWS_S3.open(s3_file_path, mode="wb"))
destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file))
while True:
chunk = source_file.read(1024)
if not chunk:
break
destination_file_gz.write(chunk)
I was trying to run something like this on an AWS Lambda function but it throws an error because It Is unable to install the s3fs module. Plus, I am using boto for the remaining parts of my code so I would like to stick to boto. How I can use boto for this too?
Basically, I am opening/reading a file from a '/tmp/path', gzipping it and then saving to an S3 bucket
Edit:
s3_resource = boto3.resource('s3')
bucket = s3_resource.Bucket('testunzipping')
s3_filename = 'samplefile.csv.'
for i in testList:
#zip_ref.open(i, ‘r’)
with contextlib.ExitStack() as stack:
source_file = stack.enter_context(open(i , mode="rb"))
destination_file = io.BytesIO()
destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file, mode='wb'))
while True:
chunk = source_file.read(1024)
if not chunk:
break
destination_file_gz.write(chunk)
destination_file.seek(0)
fileName = i.replace("/tmp/DataPump_10000838/", "")
bucket.upload_fileobj(destination_file, fileName)
Each item in the testList looks like this "/tmp/your_file.txt"
AWS Lambda function but it throws an error because It Is unable to install the s3fs module
Additional packages and your own lib code (reusable code) should be put in lambda layers.
How I can use boto for this too?
s3 = boto3.resource("s3")
bucket = s3.Bucket(bucket_name)
Then either:
If you have your file in memory (file-like object, open in bytes mode, e.g. io.BytesIO
or just open(..., 'b')
)
bucket.upload_fileobj(fileobj, s3_filename)
Or if you have a file in your current space:
bucket.upload_file(filepath, s3_filename)