I have modified h264_encoder_impl
to use nvidia grid based hardware encoder. This is done by replacing OpenH264 specific calls with Nvidia API calls. Encoded stream can be written to file successfully but writing _buffer
and _size
of encoded_image_
are not enough and RTPFragmentationHeader
also needs to be filled.
// RtpFragmentize(EncodedImage* encoded_image,
// std::unique_ptr<uint8_t[]>* encoded_image_buffer,
// const VideoFrameBuffer& frame_buffer,
// SFrameBSInfo* info,
// RTPFragmentationHeader* frag_header)
// encode
openh264_->Encode(input, &info /*out*/);
// fragmentize ?
RtpFragmentize(&encoded_image_ /*out*/, &encoded_image_buffer_, *frame_buffer,
&info, &frag_header /*out*/);
// ...
// send
encoded_image_callback_->OnEncodedImage(encoded_image_, &codec_specific, &frag_header);
Current Openh264 based implementation fills frag_header
in RTPFragmentize()
and VP8 fills it differently. I can see something with NAL untis and layers which also calculates encoded_image->_length
but I have no idea how.
I can not find any documentation on it anywhere. VP8 and OpenH264 implementations is all I have.
So what is RTPFragmentationHeader
? what does it do? What is encoded_image->_length
? How to fill it correctly when using custom H264 encoder? I can find startcode but what next? How to fill all its members?
After going through RTPFragmentize()
in h264_encoder_impl
I have figured it out.
In an encoded frame there are multiple NALUs. There are different NALUs including AUD, SPS (67), PPS (68) and IDR. Each NALU is separated by the 4 byte start code which is 00 00 00 01
.
For OpenH264, header looked like this for the first frame
[00 00 00 01 67 42 c0 20 8c 8d 40 20 03 09 00 f0 88 46 a0 00 00 00 01 68 ce 3c 80]00 00 00 01 ..
You can see start code in bold. Only bytes between square brackets belong to header, last start code is for frame data.
RTPFragmentationHeader
for above:
frag_header->fragmentationVectorSize = 3 // 2 fragments for header
// 3rd fragment for frame buffer
frag_header->fragmentationOffset[0] = 4
frag_header->fragmentationLength[0] = 15
frag_header->fragmentationOffset[1] = 23 // 4 + 15 + sizeof(startcode)
frag_header->fragmentationLength[1] = 4
frag_header->fragmentationOffset[2] = 31
frag_header->fragmentationLength[2] = 43218 // last fragment is frame buffer
Next frames always had only one fragment which looked like following
00 00 00 01 67 b8 .. .. ..
encoded_image->_length
is the size of actual encoded frame buffer and
encoded_image->_size
is maximum size of an encoded frame buffer.
OpenH264 API gives number of NALUs in encoded frame which is used to calculate fragments while the API I was using only provided header and its size no matter if header is actually added with frame or not. Searching frame bytes only for size of header allowed correct calculation of fragmentation.
Doing this finally sent the encoded data and it was decoded correctly on client browser.
Update: In essence, I had to skip RTPFragmentize()
entirely, because it is made specifically for OpenH264, and calculate frag_header
myself based on above observations.