httprestsoapstreamamazon-s3

Can I stream a file upload to S3 without a content-length header?


I'm working on a machine with limited memory, and I'd like to upload a dynamically generated (not-from-disk) file in a streaming manner to S3. In other words, I don't know the file size when I start the upload, but I'll know it by the end. Normally a PUT request has a Content-Length header, but perhaps there is a way around this, such as using multipart or chunked content-type.

S3 can support streaming uploads. For example, see here:

http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/

My question is, can I accomplish the same thing without having to specify the file length at the start of the upload?


Solution

  • You have to upload your file in 5MiB+ chunks via S3's multipart API. Each of those chunks requires a Content-Length but you can avoid loading huge amounts of data (100MiB+) into memory.

    S3 allows up to 10,000 parts. So by choosing a part-size of 5MiB you will be able to upload dynamic files of up to 50GiB. Should be enough for most use-cases.

    However: If you need more, you have to increase your part-size. Either by using a higher part-size (10MiB for example) or by increasing it during the upload.

    First 25 parts:   5MiB (total:  125MiB)
    Next 25 parts:   10MiB (total:  375MiB)
    Next 25 parts:   25MiB (total:    1GiB)
    Next 25 parts:   50MiB (total: 2.25GiB)
    After that:     100MiB
    

    This will allow you to upload files of up to 1TB (S3's limit for a single file is 5TB right now) without wasting memory unnecessarily.


    A note on your link to Sean O'Donnells blog:

    His problem is different from yours - he knows and uses the Content-Length before the upload. He wants to improve on this situation: Many libraries handle uploads by loading all data from a file into memory. In pseudo-code that would be something like this:

    data = File.read(file_name)
    request = new S3::PutFileRequest()
    request.setHeader('Content-Length', data.size)
    request.setBody(data)
    request.send()
    

    His solution does it by getting the Content-Length via the filesystem-API. He then streams the data from disk into the request-stream. In pseudo-code:

    upload = new S3::PutFileRequestStream()
    upload.writeHeader('Content-Length', File.getSize(file_name))
    upload.flushHeader()
    
    input = File.open(file_name, File::READONLY_FLAG)
    
    while (data = input.read())
      input.write(data)
    end
    
    upload.flush()
    upload.close()