I have a process that generates video frames in real time. I’m muxing the generated video frames stream in a video file (x264 codec on a mp4 container).
I'm using ffmpeg-libav and I'm basing myself on the muxing.c example. The problem with the example is that isn't a real world scenario as frames are being generated on a while loop for a given stream duration, never missing a frame.
On my program, frames are supposed to be generated at FPS, however, depending on the hardware capacity it might produce less than FPS. When I initialize the video stream context I declare that frame rate is FPS:
AVRational r = { 1, FPS };
ost->st->time_base = r;
This specifies that the video is going to have FPS frame rate but if less frames are produced, the playback will be faster because it will still reproduce the video as it if had all the declared frames per second.
After googling a lot about this topic I understand that the key to fix this is to manipulate pts and dts but I still haven't found a solution that works.
There are two key functions when writing video frames in the muxing.c example, routines that I'm using in my program:
AVFrame* get_video_frame(int timestamp, OutputStream *ost, const QImage &image)
{
/* when we pass a frame to the encoder, it may keep a reference to it
* internally; make sure we do not overwrite it here */
if (av_frame_make_writable(ost->frame) < 0)
exit(1);
av_image_fill_arrays(ost->tmp_frame->data, ost->tmp_frame->linesize, image.bits(), AV_PIX_FMT_RGBA, ost->frame->width, ost->frame->height, 8);
libyuv::ABGRToI420(ost->tmp_frame->data[0], ost->tmp_frame->linesize[0], ost->frame->data[0], ost->frame->linesize[0], ost->frame->data[1], ost->frame->linesize[1], ost->frame->data[2], ost->frame->linesize[2], ost->tmp_frame->width, -ost->tmp_frame->height);
#if 1 // this is my attempt to rescale pts, but crashes with pts<dts
ost->frame->pts = av_rescale_q(timestamp, AVRational{1, 1000}, ost->st->time_base);
#else
ost->frame->pts = ost->next_pts++;
#endif
return ost->frame;
}
On the original code, the pts is simply an incremeting integer for each frame. What I'm trying to do is to pass a timestamp in ms since the beggining of the recording so that I can rescale the pts. When I rescale pts the program crashes complaining that pts is lower then dts.
From what I've been reading, the pts/dts manipulation is supposed to be done at the packet level so I have also tried to manipulate things on write_frame routine without success.
int write_frame(AVFormatContext *fmt_ctx, AVCodecContext *c, AVStream *st, AVFrame *frame)
{
int ret;
// send the frame to the encoder
ret = avcodec_send_frame(c, frame);
if (ret<0)
{
fprintf(stderr, "Error sending a frame to the encoder\n");
exit(1);
}
while (ret >= 0)
{
AVPacket pkt = { 0 };
ret = avcodec_receive_packet(c, &pkt);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
{
break;
}
else if (ret<0)
{
//fprintf(stderr, "Error encoding a frame: %s\n", av_err2str(ret));
exit(1);
}
/* rescale output packet timestamp values from codec to stream timebase */
av_packet_rescale_ts(&pkt, c->time_base, st->time_base);
pkt.stream_index = st->index;
/* Write the compressed frame to the media file. */
//log_packet(fmt_ctx, &pkt);
ret = av_interleaved_write_frame(fmt_ctx, &pkt);
av_packet_unref(&pkt);
if (ret < 0)
{
//fprintf(stderr, "Error while writing output packet: %s\n", av_err2str(ret));
exit(1);
}
}
return ret == AVERROR_EOF ? 1 : 0;
}
How should I manipulate dts and pts so that I can achieve a video at certain frame that does not have all the frames as specified in the stream initialization? Where should I do that manipulation? On get_video_frame? On write_frame? On both?
Am I heading in the right direction? What am I missing?
It looks like you are doing mostly correct things, but your time_base is too small for your purpose.
You are telling the muxer that your frames are produced in increments of 1/FPS, like 1/25, and in no case smaller than that. If sometimes you can have a smaller or larger time in between frames (variable framerate), increase your time_base.
I'm not sure why but a lot of video software (including FFmpeg client) seems to choose 1/12800 as the time_base for MP4. That's also what I used for my application with VFR (receiving video over UDP) and it worked well.
Don't forget use the version of your code with av_rescale() when setting the pts value of the frame, and after initialising AVCodecContext, you have to set time_base on that too. Never hard code the scaling AVRational, always re read it from the AVCodecContext as the libav internals can clamp the value.
As to "why" you have to set these fields, I'd recommend reading the header Doxygen of avcodec.h and avformat.h. Above all the useful functions, they have descriptions about what fields may be set, and which must be set for it to work. This was extremely useful to learn what the library expects from you as the user.