ffmpeg: Is it possible to replace frames in a variable frame-rate video?

Machine learning algorithms for video processing typically work on frames (images) rather than video.

In my work, I use ffmpeg to dump a specific scene as a sequence of .png files, process them in some way (denoise, deblur, colorize, annotate, inpainting, etc), output the results into an equal number of .png files, and then update the original video with the new frames.

This works well with constant frame-rate (CFR) video. I dump the images as so (eg, 50-frame sequence starting at 1:47):

ffmpeg -i input.mp4 -vf "select='gte(t,107)*lt(selected_n,50)'" -vsync passthrough '107+%06d.png'

And then after editing the images, I replace the originals as so (for a 12.5fps CFR video):

ffmpeg -i input.mp4 -itsoffset 107 -framerate 25/2 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4

However, many of the videos I work with are variable frame-rate (VFR), and this has created some challenges.

A simple solution is to convert VFR video to CFR, which ffmpeg wants to do anyway, but I'm wondering if it's possible to avoid this. The reason is that CFR requires either dropping frames - since the purpose of ML video processing is usually to improve the output, I'd like to avoid this - or duplicating frames - but an upscaling algorithm that I'm working with right now uses the previous and next frame for data - if the previous or next frame is a duplicate, then ... no data for upscaling.

With -vsync passthrough, I had hoped that I could simply remove the -framerate option, and preserve the original frames as-is, but the resulting command:

ffmpeg -i input.mp4 -itsoffset 107 -i '107+%06d.png' -filter_complex "[0]overlay=eof_action=pass" -vsync passthrough -c:a copy output.mp4

uses ffmpeg's default of 25fps, and drops a lot of frames. Is there a reliable way to replace frames in VFR video?

Solution

Yes, it can be done, but it's complicated. It is crucial that the overlay video have exactly the same frame timestamps as the underlay video for this process to work reliably. Generating such a VFR video segment overlay requires capturing the frame timestamps from the source video to generate a precisely timed replacement segment.

The short version of the process is to replace the above commands with the following to extract the images:

ffmpeg -i input.mp4 -vf "select='gte(t,107)*lt(selected_n,50)',showinfo" -vsync passthrough '107+%06d.png' 2>&1 | 'sed s/\r/\n/g' | showinfo2concat.py --prefix="107+" >concat.txt

This requires a script that can be downloaded here. After editing the images, update the source video with:

ffmpeg -i input.mp4 -f concat -safe 0 -i concat.txt -filter_complex"[1]settb=1/90000,setpts=9644455+PTS*25/90000[o];[0:v:0][o]overlay=eof_action=pass" -vsync passthrough -r 90000 output.mp4

Where 90000 is the timescale (inverse of timebase), and 9644455 is the PTS of the first frame to replace.

See the source for more details about what these commands actually do.