pythondownloadstreampython-requests

Download large file in python with requests


Requests is a really nice library. I'd like to use it for downloading big files (>1GB). The problem is it's not possible to keep whole file in memory; I need to read it in chunks. And this is a problem with the following code:

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

For some reason it doesn't work this way; it still loads the response into memory before it is saved to a file.


Solution

  • With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

    def download_file(url):
        local_filename = url.split('/')[-1]
        # NOTE the stream=True parameter below
        with requests.get(url, stream=True) as r:
            r.raise_for_status()
            with open(local_filename, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192): 
                    # If you have chunk encoded response uncomment if
                    # and set chunk_size parameter to None.
                    #if chunk: 
                    f.write(chunk)
        return local_filename
    

    Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

    See body-content-workflow and Response.iter_content for further reference.