tl;dr: I try to encode acquired camera frames to h264, send via RTP and play this back on another device. SDP file generated by ffmpeg for a sample video has info which my own SDP file misses. My SDP file plays in ffplay, but not VLC, while both play ffmpeg's SDP file. I am suspecting missing sprop-parameter-sets in my SDP file.
Ultimately I want to play this back in VLC.
I am writing code that encodes images to h264 and outputs to an RTP server (or client? anyway the part that is listening). I generate an SDP file for this.
Now when instead I use some random video and have ffmpeg output an SDP file like so
ffmpeg -re -i some.mp4 -an -c:v copy -f rtp -sdp_file
video.sdp "rtp://127.0.0.1:5004"
I can see that the generated SDP file – which plays in both ffplay and
VLC – includes the base64 encoded sprop-parameter-sets
field, and
removing this causes the stream to not play.
> cat video.sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 127.0.0.1
t=0 0
a=tool:libavformat 58.76.100
m=video 5004 RTP/AVP 96
b=AS:1034
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1;
sprop-parameter-sets=Z2QANKzZQDAA7fiMBagICAoAAAMAAgAAAwDwHjBjLA==,aOvjyyLA;
profile-level-id=640034
My own SDP file on the other hand, does not contain this information, and VLC hangs for 10s and then stops trying with "no data received".
> cat test.sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 127.0.0.1
t=0 0
a=tool:libavformat 58.76.100
m=video 44499 RTP/AVP 96
b=AS:2000
a=rtpmap:96 H264/90000
a=fmtp:96 packetization-mode=1
So my theory is that my custom code must somehow add this SPS
information to the SDP file. But despite hours of searching, I could
not find a structured way to set the extradata field on the AVStream
's
AVCodecParams
. The code I'm using is roughly this (I'm sure there's
unrelated errors in there):
// variables
std::vector<std::uint8_t> imgbuf;
AVFormatContext *ofmt_ctx = nullptr;
AVCodec *out_codec = nullptr;
AVStream *out_stream = nullptr;
AVCodecContext *out_codec_ctx = nullptr;
SwsContext *swsctx = nullptr;
cv::Mat canvas_;
unsigned int height_;
unsigned int width_;
unsigned int fps_;
AVFrame *frame_ = nullptr;
AVOutputFormat *format = av_guess_format("rtp", nullptr, nullptr);
const auto url = std::string("rtp://127.0.0.1:5001");
avformat_alloc_output_context2(ofmt_ctx, format, format->name, url.c_str());
out_codec = avcodec_find_encoder(AV_CODEC_ID_H264);
stream = avformat_new_stream(ofmt_ctx, out_codec);
out_codec_ctx = avcodec_alloc_context3(out_codec);
// then, for each incoming image:
while (receive_image) {
static bool first_time = true;
if (first_time) {
// discover necessary params such as image dimensions from the first
// received image
first_time = false;
height_ = image.rows;
width_ = image.cols;
codec_ctx->codec_tag = 0;
codec_ctx->bit_rate = 2e6;
// does nothing, unfortunately
codec_ctx->thread_count = 1;
codec_ctx->codec_id = AV_CODEC_ID_H264;
codec_ctx->codec_type = AVMEDIA_TYPE_VIDEO;
codec_ctx->width = width_;
codec_ctx->height = height_;
codec_ctx->gop_size = 6;
codec_ctx->pix_fmt = AV_PIX_FMT_YUV420P;
codec_ctx->framerate = fps_;
codec_ctx->time_base = av_inv_q(fps_);
avcodec_parameters_from_context(stream, out_codec_ctx);
// this stuff is empty: is that the problem?
stream->codecpar->extradata = codec_ctx->extradata;
stream->codecpar->extradata_size = codec_ctx->extradata_size;
AVDictionary *codec_options = nullptr;
av_dict_set(&codec_options, "profile", "high", 0);
av_dict_set(&codec_options, "preset", "ultrafast", 0);
av_dict_set(&codec_options, "tune", "zerolatency", 0);
// open video encoder
avcodec_open2(codec_ctx, codec, &codec_options);
stream->time_base.num = 1;
stream->time_base.den = fps_;
avio_open(&(ofmt_ctx->pb), ofmt_ctx->filename, AVIO_FLAG_WRITE);
/* Write a file for VLC */
char buf[200000];
AVFormatContext *ac[] = {ofmt_ctx};
av_sdp_create(ac, 1, buf, 20000);
printf("sdp:\n%s\n", buf);
FILE *fsdp = fopen("test.sdp", "w");
fprintf(fsdp, "%s", buf);
fclose(fsdp);
swsctx = sws_getContext(width_, height_, AV_PIX_FMT_BGR24, width_, height_,
out_codec_ctx->pix_fmt, SWS_BICUBIC, nullptr,
nullptr, nullptr);
}
if (!frame_) {
frame_ = av_frame_alloc();
std::uint8_t *framebuf = new uint8_t[av_image_get_buffer_size(
codec_ctx->pix_fmt, width_, height_, 1)];
av_image_fill_arrays(frame_->data, frame_->linesize, framebuf,
codec_ctx->pix_fmt, width, height, 1);
frame_->width = width_;
frame_->height = height_;
frame_->format = static_cast<int>(codec_ctx->pix_fmt);
success = avformat_write_header(ofmt_ctx, nullptr);
}
if (imgbuf.empty()) {
imgbuf.resize(height_ * width_ * 3 + 16);
canvas_ = cv::Mat(height_, width_, CV_8UC3, imgbuf.data(), width_ * 3);
} else {
image.copyTo(canvas_);
}
const int stride[] = {static_cast<int>(image.step[0])};
sws_scale(swsctx, &canvas_.data, stride, 0, canvas_.rows, frame_->data,
frame_->linesize);
frame_->pts += av_rescale_q(1, out_codec_ctx->time_base, stream->time_base);
AVPacket pkt = {0};
avcodec_send_frame(out_codec_ctx, frame_);
avcodec_receive_packet(out_codec_ctx, &pkt);
av_interleaved_write_frame(ofmt_ctx, &pkt);
}
Can anyone offer some advice here?
--
Update
When setting
this->out_codec_ctx->flags |=AV_CODEC_FLAG_GLOBAL_HEADER;
extradata is actually present in the codec context, but I had to move avcodec_parameters_from_context()
after avcodec_open2()
, as the extradata is empty before opening the codec. I now get sprop-parameter-sets
in the SDP file, but VLC still does not play it.
The solution in my case was the port number (???). Apparently, VLC cannot receive from 44499
which is the port I was using, but 5004
like the ffmpeg example works. I don't know if this is a MacOS idiosyncrasy or transfers to linux as well.
I tried several ports:
So it seems that for VLC to receive RTP packets, the port number must be even-numbered? Wat?
The explanation seems to be that live555 discards the lsb of the port number: https://github.com/rgaufman/live555/blob/master/liveMedia/MediaSession.cpp#L696
So only even ports make it through unchanged. This is recommended or mandated in the RFC:
For UDP and similar protocols, RTP SHOULD use an even destination port number and the corresponding RTCP stream SHOULD use the next higher (odd) destination port number.