vb.netmp4video-processingh.264file-format

Offset from the start of the “mdat” box to the first frame


As a personal programming challenge, I have decided to write an MP4 decoder without using external libraries. To achieve this, I am using VB.NET with the .NET Framework 4.8.1 as a WinForms application, and I have purchased the documentation ISO 14496-12.

I have a function that reads the properties of an MP4 file (width, height, etc.), as well as the boxes that are important for extracting frames: Chunk offsets (stco), stsc box (First chunks, Samples per chunk, Sample description index), Sample Sizes, and stts box.

Then, in the second function, I use loops to iterate through these lists and create byte arrays with the correct data. However, I noticed that the first chunk position starts just after the beginning of “mdat”. Upon further inspection, I found that there is readable text (informational metadata) within the “mdat” box, which means it can't be image data or frame data.

enter image description here

This means I need some sort of offset from the start of the “mdat” box to the first frame and to extract the frames.

To try and solve this issue, I attempted to find the start bytes of a frame (the 0, 0, 1 NAL unit), but unfortunately, I have been unsuccessful in locating them. The NAL unit is absent in multiple files. I read about that NAL unit in the internet. As mentioned, I have even purchased the documentation and searched for various keywords, but I have not yet found a solution. I've googled some possible answers, too.

These are the boxes that I'm parsing:
• ftyp
• mdat
• moov
and in moov:
• mvhd
• trak
• tkhd
• mdhd
• hdlr
• smhd
• stsd
• stts
• stsc
• stsz
• stco
According to the documentation, all the other boxes are not mandatory. I couldn't find a ‘saio’ box for auxiliary offsets or a ‘meta’ box.

What drives me nuts is that there are questions on Stack Overflow about decoding an MP4, but no one has this problem.

Any guidance or suggestions would be appreciated.


Edit 04.06.2023

Private Function Skip_text_metadata_and_find_first_frame() As Integer ' Thanks to VC.One, Stack Overflow; May 30, 2023.
        Dim tempPos As Integer = Me.Mdat_Start_pos
        tempPos += 4
        While (True)
            Dim tempNum As Integer = Get_lower_bits_of_a_byte(Me.Data(tempPos + 4), 5) ' NALU type
            If tempNum <> 5 Then ' 101b is key frame
                Dim size_NALU As UInteger = Me.Data(tempPos + 0) * 256UI * 256UI * 256UI +
                                            Me.Data(tempPos + 1) * 256UI * 256UI +
                                            Me.Data(tempPos + 2) * 256UI +
                                            Me.Data(tempPos + 3)
                tempPos += (CInt(size_NALU) + 4)
                If tempPos > (Me.Mdat_End - 4) Then
                    Return 0
                End If
            Else

                Return (tempPos - Me.Mdat_Start_pos)
                Exit While
            End If
        End While

        Return 0
    End Function

where

Private Shared Function Get_lower_bits_of_a_byte(value As Byte, bitNumber As Integer) As Integer
        Dim two_to_the_bitnumber_minus1 As Integer = CInt(Math.Pow(2, bitNumber)) - 1
        Return value And two_to_the_bitnumber_minus1
    End Function

Solution

  • The solution is to simply use its NALU size to skip that text metadata (actually called SEI data).
    You will land on the next NALU which might be a video frame (or else keep skipping by size).
    PS: There is no 0, 0, 0, 1 start codes when NALU is inside an MP4 (replaced by a size integer).

    Since you are learning MP4 bytes, I will add a more detailed summary...

    In summary, You have SEI as your first NAL unit but if you skip by its size you should land on the next NALU which would be a video frame (and is a first frame so it's expected to be a "keyframe").
    Test on a MP4 file with no audio to simplify the amount of NALU types during your practice.

    Solution: (with pseudo-code as example)

    Your text of mdat.....ÿÿ}ÜEé makes it hard to know the actual byte values.

    Assuming your data has a layout like...

    in text:  m  d  a  t  .  .  .  .  .  ÿ  ÿ  }  Ü  E  é
    in  hex:  6D 64 61 74 AA BB CC DD XX FF FF 7D DC 45 E9
    

    The structure of those bytes means...

    The layout looks like you have a NAL unit of SEI metadata.
    SEI (Supplementary Enhancement Information) is a form of side information that is useful to the decoder but not always needed. If the H264 is inside an MP4 (which itself has an "AVC Config" section) then SEI is not needed by the decoder/player. It can be safely removed in most MP4 files (just some encoder choose to add it in preparation for future use cases like fragmenting etc)...

    Example pseudo-code:

    //# Vars setup
    int myPos = 0; //# offset/position within MP4 bytes
    int myNum = 0; //# holds temporary numeric values
    int size_NALU = 0; //# size of NAL unit in bytes length.
    int startPos_of_mdat = some_Num; //# use actual position for start of "mdat"
    
    //# Vars temp numbers to create Integer from (bytes) Array values
    int tempA = 0; int tempB = 0; int tempC = 0; int tempD = 0;
    
    //# Main code
    int myPos = startPos_of_mdat //# Is pos of the starting "m" letter/byte of "mdat"
    myPos += 4; //# Move forward +4 bytes to reach the first NALU (ie: its first size byte)
    
    While( true ) //# search by skipping according to Size, then check NALU type...
    {
        myNum = ( MP4_Bytes[ myPos+4 ] & 0x1F ); //# extract the "NALU type" value
        
        if( myNum != 5 ) //# if not keyframe, then skip to next NALU...
        {
            tempA = MP4_Bytes[ myPos+0 ];
            tempB = MP4_Bytes[ myPos+1 ];
            tempC = MP4_Bytes[ myPos+2 ];
            tempD = MP4_Bytes[ myPos+3 ];
            
            //# concat into one 32-bit integer
            size_NALU = ( tempA << 24 | tempB << 16 | tempC << 8 | tempD );
            
            //# update to new position (is the new "skip to" point)
            myPos += (size_NALU + 4); //# must add +4 to account for the extra four bytes of SIZE's integer
            
            //# While loop will repeat until an ELSE is triggered
            //# Can add safety by having an IF to stop whenever myPos is past/larger than the total bytes length. 
            
        }
        else
        {
            //# stop if keyframe is found
            Console.WriteLine( "## Found a Keyframe at offset: " + myPos );
            break;
        }
    }