pythoncompression

Python ungzipping stream of bytes?


Here is the situation:

Question

How can I ungzip the streams directly and read the contents?

I do not want to create temp files, they don't look good.


Solution

  • Yes, you can use the zlib module to decompress byte streams:

    import zlib
    
    def stream_gzip_decompress(stream):
        dec = zlib.decompressobj(32 + zlib.MAX_WBITS)  # offset 32 to skip the header
        for chunk in stream:
            rv = dec.decompress(chunk)
            if rv:
                yield rv
        if dec.unused_data:
            # decompress and yield the remainder
            yield dec.flush()
    

    The offset of 32 signals to the zlib header that the gzip header is expected but skipped.

    The S3 key object is an iterator, so you can do:

    for data in stream_gzip_decompress(k):
        # do something with the decompressed data