I'm having regularly issue with hvc1 videos getting an inconsistent number of frames between ffprobe info and FFmpeg info, and I would like to know what could be the reason for this issue and how if it's possible to solve it without re-encoding the video.
I wrote the following sample script with a test video I have
I split the video into 5-sec segments and I get ffprobe giving the expected video length but FFmpeg gave 3 frames less than expected on every segment but the first one.
The issue is exactly the same if I split by 10 seconds or any split, I always lose 3 frames.
I noted that the first segment is always 3 frames smaller (on ffprobe) than the other ones and it's the only consistent one.
Here is an example script I wrote to test this issue :
# get total video frame number using ffprobe or ffmpeg
total_num_frames=$(ffprobe -v quiet -show_entries stream=nb_read_packets -count_packets -select_streams v:0 -print_format json test_video.mp4 | jq '.streams[0].nb_read_packets' | tr -d '"')
echo $total_num_frames
ffmpeg -hwaccel cuda -i test_video.mp4 -vsync 2 -f null -
# Check ffprobe of each segment is consistent
rm -rf clips && mkdir clips && \
ffmpeg -i test_video.mp4 -acodec copy -f segment -vcodec copy -reset_timestamps 1 -segment_time 5 -map 0 clips/part_%d.mp4
count_frames=0
for i in {0..5}
do
num_packets=$(ffprobe -v quiet -show_entries stream=nb_read_packets -count_packets -select_streams v:0 -print_format json clips/part_$i.mp4 | jq '.streams[0].nb_read_packets' | tr -d '"')
count_frames=$(($count_frames+$num_packets))
echo $num_packets $count_frames $total_num_frames
done
Output is the following
3597
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf58.29.100
Duration: 00:00:59.95, start: 0.035000, bitrate: 11797 kb/s
Stream #0:0(und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 1920x1080, 11692 kb/s, 60.01 fps, 60 tbr, 19200 tbn, 19200 tbc (default)
Metadata:
handler_name : Core Media Video
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 91 kb/s (default)
Metadata:
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2mp41
encoder : Lavf58.29.100
Stream #0:0(und): Video: wrapped_avframe, nv12, 1920x1080, q=2-31, 200 kb/s, 60 fps, 60 tbn, 60 tbc (default)
Metadata:
handler_name : Core Media Video
encoder : Lavc58.54.100 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s (default)
Metadata:
handler_name : Core Media Audio
encoder : Lavc58.54.100 pcm_s16le
frame= 3597 fps=788 q=-0.0 Lsize=N/A time=00:00:59.95 bitrate=N/A speed=13.1x
video:1883kB audio:5162kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
then
297 297 3597
300 597 3597
300 897 3597
300 1197 3597
300 1497 3597
300 1797 3597 <--- output are consistent based on ffprobe
But then if i check segment size with ffmpeg with the following command
ffmpeg -hwaccel cuda -i clips/part_$i.mp4 -vsync 2 -f null -
for part 0 its ok
frame= 297 fps=0.0 q=-0.0 Lsize=N/A time=00:00:04.95 bitrate=N/A speed=12.5x
video:155kB audio:424kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
for all other parts it's inconsistent and should be 300
frame= 297 fps=0.0 q=-0.0 Lsize=N/A time=00:00:04.95 bitrate=N/A speed=12.3x
video:155kB audio:423kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
The issue is exactly the same with any other interval size, e.g with 10 seconds I would get the following video size:
ffprobe 597 - 600 ...
ffmpeg 597 597 ...
I thought it could be related to source vfr or cfr but I tried to convert the input to cfr and nothing changed.
Moreover, I tried to force the keyframe every second to check if it was a keyframe issue with the following arg: -force_key_frames "expr:gte(t,n_forced*1)", but the problem is exactly the same.
What am I doing wrong? it happens to me a lot with files in hvc1 and I really don't know how to deal with that.
The source of the differences is that FFprobe counts the discarded packets, and FFmpeg doesn't count the discarded packets as frames.
Your results are consistent with video stream that is created with 3 B-Frames (3 consecutive B-Frames for every P-Frame or I-Frame).
According to Wikipedia:
I‑frames are the least compressible but don't require other video frames to decode.
P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.
When splitting a video with P-Frame and B-Frame into segments without re-encoding, the dependency chain breaks.
AV_PKT_FLAG_DISCARD
flag).For the purpose of working on the same dataset, we my build synthetic video (to be used as input).
Building synthetic video with the following command:
ffmpeg -y -r 60 -f lavfi -i testsrc=size=384x256:rate=1 -vf "setpts=N/60/TB" -g 60 -vcodec libx265 -x265-params crf=28:bframes=3:b-adapt=0 -tag:v hvc1 -pix_fmt yuv420p -t 20 test_video.mp4
-g 60
set GOP size to 60 frames (insert a key frame every 60 frames).bframes=3:b-adapt=0
force 3 consecutive B-Frames.For verifying the number of I/P/B frames, we may use FFprobe:
ffprobe -i test_video.mp4 -show_frames -show_entries frame=pict_type
The output is like:
pict_type=I
pict_type=B
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=B
...
Segment the video by time (5 seconds per segment):
ffmpeg -i test_video.mp4 -f segment -vcodec copy -reset_timestamps 1 -segment_time 5 clips/part_%d.mp4
FFprobe counting:
297 1497 1200
300 1797 1200
300 2097 1200
303 2400 1200
FFmpeg counting:
frame= 297
frame= 297
frame= 297
frame= 300
As you can see, the result is consistent with your output.
We may identify the "discarded" packets using FFprobe:
ffprobe -i part_1.mp4 -show_packets
Look for flags=_D
.
Packet with flags=_D
is marked as "discarded"
Note: In a video stream every packet matches a frame.
FFprobe output begins with:
flags=K_
flags=_D
flags=_D
flags=_D
flags=__
flags=__
flags=__
...
For every middle segment, 3 packets are marked as "discarded", and that is the reason for the 3 missing frames in FFmpeg compared to FFprobe.