I am trying to parse this SPS payload to understand the H264 codec better, I am successful for the most part, but having hard time parsing or understanding num_units_in_tick
and time_scale
details for the following payload:
67 64 00 28 AC 2B 40 2C 01 0B CB 26 02 20 00 00 03 00 20 00 00 04 01 B4 11 08 D4
There is no code yet, I first practice by manually parsing the bits I show above.
I am following the latest H264 spec detailed in the MPEG documentation: https://www.mpeg.org/standards/MPEG-4/10/
When I double-check with some established 3rd party SPS parsers like FFmpeg, they give:
num_units_in_tick = 1
time_scale = 32
fixed_frame_rate_flag = 0
But when I parse manually as per the spec I get:
num_units_in_tick = 24 = (0 0000 0000 0000 0000 0000 0000 0011 000 - 108th bit to 139th bit)
time_scale = 16777216 = (0 0000 0010 0000 0000 0000 0000 0000 000 - 140th bit to 171st bit)
fixed_frame_rate_flag = 0 (0 - 172nd bit)
Can someone correct or affirm my understanding of this parsing.
The problem is that an "Emulation Prevention
" byte (of 0x03) is being added to your SPS during the encoding stage.
This extra 0x03 byte needs to be deleted / skipped during the SPS parsing stage (or when decoding of any NALU).
Deleting/skipping will restore the original state of the H264 bytes, then your numbers will match the output of other software like VLC or FFmpeg.
Explained:
During encoding... After the starting NALU header 0x67, any further following byte sequence having two consecutive00
bytes must be followed by an added 03
byte to avoid the numbers looking like the start-code of the next NALU (or other issues).
Consider a mid-stream 00 00 00 01
inside the SPS data, such bytes will therefore be temporarily encoded as: 00 00 03 00 01
. During the decoding process you must remove the temporary 03
byte and it restores back the originally intended H264 bytes.
see references:
Solution:
Looking at a section of your bytes: 26 02 20 00 00 03 00 20 00 00 04 01 B4
That 20 00 00 03 00 20
needs to drop/skip the 03
to become like: 20 00 00 00 20.
You have now removed the "Emulation Prevention
" byte and your SPS parsing should now work as expected to give time_scale
of 32 and num_units_in_tick
of 1.