audioffmpegvideo-encodingh.265

Using FFMPEG to merge video frames back into a video with subtitle messes up the audio when seeking


I broke down a h265 video into frames and try to merge them back together with its original audio, subtitle and fonts. The resulting video has an issue where when I try to seek at some points in the video, the audio stops playing. And even when I don't seek, at a point the video hangs and the audio keeps playing. Seconds later, the video resumes, but now, video and audio are out of sync. This doesn't happen with the original video.

The reason I'm breaking the video into frames and merging them is because I want to upscale each frame. But I'm going to leave that part out because this issue occurs with the original unscaled frames.

Here's the details of the original video. Notice it has video, audio and two font streams.

.\ffmpeg.exe -i "input.mkv"

Input #0, matroska,webm, from 'input.mkv':
  Metadata:
    encoder         : libebml v1.3.10 + libmatroska v1.5.2
    creation_time   : 2021-01-07T00:20:19.000000Z
  Duration: 00:23:02.05, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
    Metadata:
      BPS-eng         : 278671
      DURATION-eng    : 00:23:02.006000000
      NUMBER_OF_FRAMES-eng: 33135
      NUMBER_OF_BYTES-eng: 48140731
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp
    Metadata:
      BPS-eng         : 36166
      DURATION-eng    : 00:23:02.016000000
      NUMBER_OF_FRAMES-eng: 32391
      NUMBER_OF_BYTES-eng: 6247833
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:2(eng): Subtitle: ass (default)
    Metadata:
      BPS-eng         : 76
      DURATION-eng    : 00:21:20.790000000
      NUMBER_OF_FRAMES-eng: 246
      NUMBER_OF_BYTES-eng: 12264
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:3: Attachment: ttf
    Metadata:
      filename        : Roboto-Medium.ttf
      mimetype        : application/x-truetype-font
    Stream #0:4: Attachment: ttf
    Metadata:
      filename        : Roboto-MediumItalic.ttf
      mimetype        : application/x-truetype-font

Here's how I break it into frames

.\ffmpeg.exe -i "input.mkv" -qscale:v 1 -qmin 1 -qmax 1 -vsync 0 "InputFolder/frame%08d.png"

Here's how I merge the frames back to video with all the original streams except the video

.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1 -map -1:v -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"

Here's the details of the resulting video:

.\ffmpeg.exe -i "output.mkv"

Input #0, matroska,webm, from 'output.mkv':
  Metadata:
    ENCODER         : Lavf58.45.100
  Duration: 00:23:02.05, start: 0.000000, bitrate: 245 kb/s
    Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
    Metadata:
      ENCODER         : Lavc58.91.100 libx265
      DURATION        : 00:23:01.777000000
    Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp (default)
    Metadata:
      BPS-eng         : 36166
      DURATION-eng    : 00:23:02.016000000
      NUMBER_OF_FRAMES-eng: 32391
      NUMBER_OF_BYTES-eng: 6247833
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      DURATION        : 00:23:02.046000000
    Stream #0:2(eng): Subtitle: ass (default)
    Metadata:
      BPS-eng         : 76
      DURATION-eng    : 00:21:20.790000000
      NUMBER_OF_FRAMES-eng: 246
      NUMBER_OF_BYTES-eng: 12264
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      ENCODER         : Lavc58.91.100 ssa
      DURATION        : 00:21:21.580000000
    Stream #0:3: Attachment: ttf
    Metadata:
      filename        : Roboto-Medium.ttf
      mimetype        : application/x-truetype-font
    Stream #0:4: Attachment: ttf
    Metadata:
      filename        : Roboto-MediumItalic.ttf
      mimetype        : application/x-truetype-font

One thing to note is that I've done this successfully numerous times with h264 videos. No audio issues. Another thing to note which might be more relevant is that when I merge the frames with only the original audio stream (as opposed to all original streams except video), the audio issue does not occur.

.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1:a:0 -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"

Produces no audio issues. But this isn't good for me because I want the subtitles and fonts from the original video.

If anyone needs me to upload the original video somewhere so they can reproduce it, let me know.

Edit: Also note that Merging the frames with the original audio AND subtitle stream, i.e without the fonts, the issue remains.


Solution

  • I was able to fix this by adding -max_interleave_delta 0 to the merge command.