I broke down a h265 video into frames and try to merge them back together with its original audio, subtitle and fonts. The resulting video has an issue where when I try to seek at some points in the video, the audio stops playing. And even when I don't seek, at a point the video hangs and the audio keeps playing. Seconds later, the video resumes, but now, video and audio are out of sync. This doesn't happen with the original video.
The reason I'm breaking the video into frames and merging them is because I want to upscale each frame. But I'm going to leave that part out because this issue occurs with the original unscaled frames.
Here's the details of the original video. Notice it has video, audio and two font streams.
.\ffmpeg.exe -i "input.mkv"
Input #0, matroska,webm, from 'input.mkv':
Metadata:
encoder : libebml v1.3.10 + libmatroska v1.5.2
creation_time : 2021-01-07T00:20:19.000000Z
Duration: 00:23:02.05, start: 0.000000, bitrate: 320 kb/s
Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
Metadata:
BPS-eng : 278671
DURATION-eng : 00:23:02.006000000
NUMBER_OF_FRAMES-eng: 33135
NUMBER_OF_BYTES-eng: 48140731
_STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
_STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp
Metadata:
BPS-eng : 36166
DURATION-eng : 00:23:02.016000000
NUMBER_OF_FRAMES-eng: 32391
NUMBER_OF_BYTES-eng: 6247833
_STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
_STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:2(eng): Subtitle: ass (default)
Metadata:
BPS-eng : 76
DURATION-eng : 00:21:20.790000000
NUMBER_OF_FRAMES-eng: 246
NUMBER_OF_BYTES-eng: 12264
_STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
_STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
Stream #0:3: Attachment: ttf
Metadata:
filename : Roboto-Medium.ttf
mimetype : application/x-truetype-font
Stream #0:4: Attachment: ttf
Metadata:
filename : Roboto-MediumItalic.ttf
mimetype : application/x-truetype-font
Here's how I break it into frames
.\ffmpeg.exe -i "input.mkv" -qscale:v 1 -qmin 1 -qmax 1 -vsync 0 "InputFolder/frame%08d.png"
Here's how I merge the frames back to video with all the original streams except the video
.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1 -map -1:v -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"
Here's the details of the resulting video:
.\ffmpeg.exe -i "output.mkv"
Input #0, matroska,webm, from 'output.mkv':
Metadata:
ENCODER : Lavf58.45.100
Duration: 00:23:02.05, start: 0.000000, bitrate: 245 kb/s
Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
Metadata:
ENCODER : Lavc58.91.100 libx265
DURATION : 00:23:01.777000000
Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp (default)
Metadata:
BPS-eng : 36166
DURATION-eng : 00:23:02.016000000
NUMBER_OF_FRAMES-eng: 32391
NUMBER_OF_BYTES-eng: 6247833
_STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
_STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
DURATION : 00:23:02.046000000
Stream #0:2(eng): Subtitle: ass (default)
Metadata:
BPS-eng : 76
DURATION-eng : 00:21:20.790000000
NUMBER_OF_FRAMES-eng: 246
NUMBER_OF_BYTES-eng: 12264
_STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
_STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
_STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
ENCODER : Lavc58.91.100 ssa
DURATION : 00:21:21.580000000
Stream #0:3: Attachment: ttf
Metadata:
filename : Roboto-Medium.ttf
mimetype : application/x-truetype-font
Stream #0:4: Attachment: ttf
Metadata:
filename : Roboto-MediumItalic.ttf
mimetype : application/x-truetype-font
One thing to note is that I've done this successfully numerous times with h264 videos. No audio issues. Another thing to note which might be more relevant is that when I merge the frames with only the original audio stream (as opposed to all original streams except video), the audio issue does not occur.
.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1:a:0 -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"
Produces no audio issues. But this isn't good for me because I want the subtitles and fonts from the original video.
If anyone needs me to upload the original video somewhere so they can reproduce it, let me know.
Edit: Also note that Merging the frames with the original audio AND subtitle stream, i.e without the fonts, the issue remains.
I was able to fix this by adding -max_interleave_delta 0
to the merge command.