pythonmp4moviepynatsort

Combining 100's of mp4s without running out of ram. In order


I have some code that is great for doing small numbers of mp4s, but at the 100th one I start to run out of ram. I know you can sequentially write CSV files, I am just not sure how to do that for mp4s. Here is the code I have:`11

This solution works:

from moviepy.editor import *
import os
from natsort import natsorted

L = []

for root, dirs, files in os.walk("/path/to/the/files"):
#files.sort()
files = natsorted(files)
for file in files:
    if os.path.splitext(file)[1] == '.mp4':
        filePath = os.path.join(root, file)
        video = VideoFileClip(filePath)
        L.append(video)

final_clip = concatenate_videoclips(L)
final_clip.to_videofile("output.mp4", fps=24, remove_temp=False)`

The code above is what I tried, I expected a smooth result on first glance, though it worked perfect on a test batch it could not handle the main batch.


Solution

  • You appear to be appending the contents of a large number of video files to a list. Yet you report that available RAM is much less than total size of those files. So don't accumulate the result in memory.

    Follow one of these approaches:

    keep an open file descriptor

            with open("combined_video.mp4", "wb") as fout:
                for file in files:
                    ...
                    video = ...
                    fout.write(video)
    

    Or perhaps it is fout.write(video.data) or video.write_segment(fout) -- I don't know about the video I/O library you're using.

    The point is that the somewhat large video object is re-assigned each time, so it does not grow without bound, unlike your list L.

    append to existing file

    We can nest in the other order, if that's more convenient.

            for file in files:
                with open("combined_video.mp4", "ab") as fout:
                    ...
                    video = ...
                    fout.write(video)
    

    Here we're doing binary append. Repeated open / close is slighty less efficient. But it has the advantage of letting you do a run with four input files, then python exits, then later you do a run with pair of new files and you'll still find the expected half a dozen files in the combined output.