My real S3 helper does the following:
def read_gzipped_csv_from_s3(self, key):
return self.bucket.Object(key).get()
obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')
I need to mock read_gzipped_csv_from_s3()
method for unit tests. The problem is that the response should be a gzipped CSV which I must construct from a string because I cannot store anything as tests are running in a Gitlab's pipeline.
So I have some csv as a string:
CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""
Then I have some example code for using a regular CSV file to mock botocore.response.StreamingBody:
body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))
but I can't figure out how to create gzipped CSV in memory: there's the beginning I've found somewhere:
import gzip
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
<can't figure out what's here>
Help would be much appreciated.
Tried tons of other snippets from SO and modified them but no luck. What I expect: gzipped CSV file-like object to pass to StreamingBody
You could use .write()
to write the data into the BytesIO
object. You also need .seek()
to reset the file position to the beginning before you can read it.
import gzip
from io import BytesIO, TextIOWrapper
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
wrapper.write(CSV_DATA)
buffer.seek(0)
df = pd.read_csv(buffer, compression='gzip')