pythonamazon-s3gzipboto

How to gzip while uploading into s3 using boto


I have a large local file. I want to upload a gzipped version of that file into S3 using the boto library. The file is too large to gzip it efficiently on disk prior to uploading, so it should be gzipped in a streamed way during the upload.

The boto library knows a function set_contents_from_file() which expects a file-like object it will read from.

The gzip library knows the class GzipFile which can get an object via the parameter named fileobj; it will write to this object when compressing.

I'd like to combine these two functions, but the one API wants to read by itself, the other API wants to write by itself; neither knows a passive operation (like being written to or being read from).

Does anybody have an idea on how to combine these in a working fashion?

EDIT: I accepted one answer (see below) because it hinted me on where to go, but if you have the same problem, you might find my own answer (also below) more helpful, because I implemented a solution using multipart uploads in it.


Solution

  • There really isn't a way to do this because S3 doesn't support true streaming input (i.e. chunked transfer encoding). You must know the Content-Length prior to upload and the only way to know that is to have performed the gzip operation first.