pythonopencvimage-processingffmpegvideo-processing

Fastest way to extract moving dynamic crop from video using ffmpeg


I'm working on an AI project that involves object detection and action recognition on roughly 30 minute videos.

My pipeline is the following:

The models are fast but actual writing of the crops to disk is slow. Sure, using an SSD would speed it up but I'm sure ffmpeg would greatly speed it up.

Some of the challenges with the crops:

My process for extracting crops is simple using cv2.imwrite(output_crop_path, crop) in a for loop.

I've done experiments trying to use sndcommand and filter_complex. I tried this https://stackoverflow.com/a/67508233/4447761 but it outputs an image with black below the crop and the image gets wrapped around on the x axis.

enter image description here


Solution

  • Don't store the pictures.

    Store just the sequence of bounding boxes.

    Then, for whatever you wanted to do with that mountain of individual images, instead decode the video and read the sequence of boxes, and take your crops out of it like that, on the fly.

    I'd recommend using PyAV for video reading. It gives you the presentation timestamps reliably. You shouldn't even need them, but having them along in the file can be very helpful for debugging.

    Maybe you want to use pandas to write and read such a file. CSV is a popular format.

    Another advantage of keeping the boxes like that: you can work on this sequence, maybe smooth it, or mix in the results of an optical tracker (sticks like glue but also creeps over time).


    If you really do need to write a video, then write a video. Individual frames as files are a serious burden on most file systems, let alone file managers. Again I would recommend PyAV to write a video. It gives you all kinds of control over how to do it, and the basic code isn't all that much. PyAV comes with examples.