pythonboto3python-unittestbotocore

Mocking file-like gzipped csv for boto3's StreamingBody


My real S3 helper does the following:

def read_gzipped_csv_from_s3(self, key):
    return self.bucket.Object(key).get()

obj = S3Helper().read_gzipped_csv_from_s3(key)
df = pd.read_csv(obj['Body'], compression='gzip')

I need to mock read_gzipped_csv_from_s3() method for unit tests. The problem is that the response should be a gzipped CSV which I must construct from a string because I cannot store anything as tests are running in a Gitlab's pipeline.

So I have some csv as a string:

CSV_DATA = """
name,value,control
ABC,1.0,1
DEF,2.0,0
GHI,3.0,-1
"""

Then I have some example code for using a regular CSV file to mock botocore.response.StreamingBody:

body_encoded = open('accounts.csv').read().encode()
mock_stream = StreamingBody(io.BytesIO(body_encoded), len(body_encoded))

but I can't figure out how to create gzipped CSV in memory: there's the beginning I've found somewhere:

import gzip

buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        <can't figure out what's here>

Help would be much appreciated.

Tried tons of other snippets from SO and modified them but no luck. What I expect: gzipped CSV file-like object to pass to StreamingBody


Solution

  • You could use .write() to write the data into the BytesIO object. You also need .seek() to reset the file position to the beginning before you can read it.

    import gzip
    from io import BytesIO, TextIOWrapper
    
    buffer = BytesIO()
    with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
        with TextIOWrapper(compressed, encoding='utf-8') as wrapper:
            wrapper.write(CSV_DATA)
    buffer.seek(0)
    df = pd.read_csv(buffer, compression='gzip')