
Concatenate a video, image and audio using ffmpeg

I am trying to concatenate a group of images with associated audio with a video clip at the start and front of the video. Whenever I concatenate the image with the associated audio it dosen't playback correctly in VLC media player and only displays the image for a frame before cutting to black and continually playing audio. I came across this github issue: where the accepted solution was the one I implemented but one of the comments mentioned this issue of incorrect playback and error on youtube.

Generates a clip from an image and a wav file, helper function for export_video
def generate_clip(img):
    transition_cond = os.path.exists("static/transitions/" + img + ".mp4")
    chart_path = os.path.exists("charts/" + img + ".png")
    if transition_cond:
        clip = ffmpeg.input("static/transitions/" + img + ".mp4")
    elif chart_path:
        clip = ffmpeg.input("charts/" + img + ".png")
        clip = ffmpeg.input("static/transitions/Transition.jpg")
    audio_clip = ffmpeg.input("audio/" + img + ".wav")
    clip = ffmpeg.concat(clip, audio_clip, v=1, a=1)
    clip = ffmpeg.filter(clip, "setdar","16/9")
    return clip

Combines the charts from charts/ and the audio from audio/ to generate one final video that will be uploaded to Youtube
def export_video(CHARTS):
    clips = []
    intro = generate_clip("Intro")

    for key in CHARTS.keys():
        value = CHARTS.get(key)
        value.insert(0, key)
        subclip = []
        for img in value:
        concat_clip = ffmpeg.concat(*subclip)
    outro = generate_clip("Outro")

    concat_clip = ffmpeg.concat(*clips)


  • It is unfortunate concat filter does not offer the shortest option like overlay. Anyway, the issue here is that image2 demuxer uses 25 fps by default, so a video stream with one image only lasts for 1/25 seconds long. There are a several ways to address this, but you first need to get the duration of the paired audio files. To incorporate the duration information to the ffmpeg command, you can:

    1. Use tpad filter for each video (in series with setdar) to make the video duration to match the audio. Padded amount should be 1/25 seconds less than the audio duration.
    2. Specify -loop 1 input option so the image will loop (indefinitely) and then specify an additional -t {duration} input option to limit the number of loops. Caution that the video duration may not be exact.
    3. Specify -r {1/duration} so the image will last as long as the audio and use fps filter on each input to the output frame rate.

    I'm not familiar with ffmpeg-python so I cannot provide its solution, but if you're interested, I'd be happy to post an equivalent code with my ffmpegio package.

    [edit] ffmpegio Solution

    Here is how I'd code the 3rd solution with ffmpegio:

    import ffmpegio
    def generate_clip(img):
        Generates a clip from an image and a wav file, 
        helper function for export_video
        transition_cond = path.exists("static/transitions/" + img + ".mp4")
        chart_path = path.exists("charts/" + img + ".png")
        if transition_cond:
            video_file = "static/transitions/" + img + ".mp4"
        elif chart_path:
            video_file = "charts/" + img + ".png"
            video_file = "static/transitions/Transition.jpg"
        audio_file = "audio/" + img + ".wav"
        video_opts = {}
        if not transition_cond:
            # audio_streams_basic() returns audio duration in seconds as Fraction
            # set the "framerate" of the video to be the reciprocal
            info = ffmpegio.probe.audio_streams_basic(audio_file)
            video_opts["r"] = 1 / info[0]["duration"]
        return [(video_file, video_opts), (audio_file, None)]
    def export_video(CHARTS):
        Combines the charts from charts/ and the audio from audio/ 
        to generate one final video that will be uploaded to Youtube
        # get all input files (video/audio pairs)
        clips = [
            *(generate_clip(img) for key, value in CHARTS.items() for img in value),
        # number of clips
        nclips = len(clips)
        # filter chains to set DAR and fps of all video streams
        vfilters = (f"[{2*n}:v]setdar=16/9,fps=30[v{n}]" for n in range(nclips))
        # concatenation filter input: [v0][1:a][v1][3:a][v2][5:a]...
        concatfilter = "".join((f"[v{n}][{2*n+1}:a]" for n in range(nclips))) + f"concat=n={nclips}:v=1:a=1[vout][aout]"
        # form the full filtergraph
        fg = ";".join((*vfilters, concatfilter))
        # set output file and options
        output = ("export/export.mp4", {"map": ["[vout]", "[aout]"]})
        # run ffmpeg
                "inputs": [input for pair in clips for input in pair],
                "outputs": [output],
                "global_options": {"filter_complex": fg},

    Since this code does not use the read/write features, ffmpegio-core package suffices:

    pip install ffmpegio-core

    Make sure that FFmpeg binary can be found by ffmpegio. See the installation doc.

    Here are the direct links to the documentations of the functions used:

    The code has not been fully validated. If you encounter a problem, it might be the easiest to post it on the GitHub Discussions to proceed.