pythonramwith-statement

Opening a file and writing to it: Why is there not a RAM overflow with large files?


I am using python and let's suppose I have the following code:

with open (r'C:\path\filetobewritten.txt', 'w') as fileout:
    for fname in all_files:
        with open(fname, 'r') as filein:
            for line in filein:
                fileout.write(line)
        

(where all_files is just a result from glob storing all filenames in a list)

From my understanding this code opens a file called filetobewritten.txt. Then it iteratively opens files and write the lines into the filetobewritten file. It closes each "filein" after it is done.

However, the overall file filetobewritten from my understanding is always "open" in RAM and closed after the last file is written and then the file so to say is saved. I can see this in Windows Explorer that the size is actually 0 and does not increase (even when refreshing with F5 while the code is running). Only after the last file is written the file then gets its complete file size. So not saved in between.

From my point of view from that (and from the code) I can conclude that all the information is kept in RAM. The file is always kept open and the maximum file size will be reached with reading the last file. So it needs to be stored in RAM memory. Now, suppose I have many files and quite large, let's say at the end the filetobewritten.txt will be 38 GB. My computer has RAM 32. I expected it to fail, however it did not. So I must be wrong. Therefore my question is: Is this stored in RAM memory and why did I not get any problems? For opening a large file rule of thumb is that douple the RAM size is needed.


Solution

  • The output file is buffered by default to make writes faster. The system will manage the write buffer; if your system was at risk of running out of memory, it'd be flushed onto disk, freeing that RAM.

    Open with open(..., "w", buffering=0), or alternatively call fileout.flush() by hand to see the file grow in real time.

    For opening a large file rule of thumb is that douple the RAM size is needed.

    Nope. You don't need (practically) any memory to open a file.