videovideo-streamingwebrtch.264openh264

WebRTC: What is RTPFragmentationHeader in encoder implementation?


I have modified h264_encoder_impl to use nvidia grid based hardware encoder. This is done by replacing OpenH264 specific calls with Nvidia API calls. Encoded stream can be written to file successfully but writing _buffer and _size of encoded_image_ are not enough and RTPFragmentationHeader also needs to be filled.

// RtpFragmentize(EncodedImage* encoded_image,
//                       std::unique_ptr<uint8_t[]>* encoded_image_buffer,
//                       const VideoFrameBuffer& frame_buffer,
//                       SFrameBSInfo* info,
//                      RTPFragmentationHeader* frag_header)

// encode
openh264_->Encode(input, &info /*out*/);

// fragmentize ?
RtpFragmentize(&encoded_image_ /*out*/, &encoded_image_buffer_, *frame_buffer, 
               &info, &frag_header /*out*/); 

// ...

// send 
encoded_image_callback_->OnEncodedImage(encoded_image_, &codec_specific, &frag_header);

Current Openh264 based implementation fills frag_header in RTPFragmentize() and VP8 fills it differently. I can see something with NAL untis and layers which also calculates encoded_image->_length but I have no idea how.

I can not find any documentation on it anywhere. VP8 and OpenH264 implementations is all I have.

So what is RTPFragmentationHeader? what does it do? What is encoded_image->_length? How to fill it correctly when using custom H264 encoder? I can find startcode but what next? How to fill all its members?


Solution

  • After going through RTPFragmentize() in h264_encoder_impl I have figured it out.

    In an encoded frame there are multiple NALUs. There are different NALUs including AUD, SPS (67), PPS (68) and IDR. Each NALU is separated by the 4 byte start code which is 00 00 00 01.

    For OpenH264, header looked like this for the first frame

    [00 00 00 01 67 42 c0 20 8c 8d 40 20 03 09 00 f0  
     88 46 a0 00 00 00 01 68 ce 3c 80]00 00 00 01 .. 
    

    You can see start code in bold. Only bytes between square brackets belong to header, last start code is for frame data.

    RTPFragmentationHeader for above:

    frag_header->fragmentationVectorSize = 3     // 2 fragments for header
                                                 // 3rd fragment for frame buffer
    
    frag_header->fragmentationOffset[0]  = 4     
    frag_header->fragmentationLength[0]  = 15
    
    frag_header->fragmentationOffset[1]  = 23    // 4 + 15 + sizeof(startcode)
    frag_header->fragmentationLength[1]  = 4    
    
    frag_header->fragmentationOffset[2]  = 31   
    frag_header->fragmentationLength[2]  = 43218 // last fragment is frame buffer
    

    Next frames always had only one fragment which looked like following

    00 00 00 01 67 b8 .. .. ..
    

    encoded_image->_length is the size of actual encoded frame buffer and
    encoded_image->_size is maximum size of an encoded frame buffer.

    OpenH264 API gives number of NALUs in encoded frame which is used to calculate fragments while the API I was using only provided header and its size no matter if header is actually added with frame or not. Searching frame bytes only for size of header allowed correct calculation of fragmentation.

    Doing this finally sent the encoded data and it was decoded correctly on client browser.

    Update: In essence, I had to skip RTPFragmentize() entirely, because it is made specifically for OpenH264, and calculate frag_header myself based on above observations.