c++ffmpegnvenc

SegFault while freeing nvenc hwdevice_ctx


For a project, I created a class encoding the output of an OpenGL renderbuffer object using h264_nvenc. Unfortunately, tidying up doesn't work, and the program crashes with a SegFault. The reason is accessing an unaccessable memory region, happening twice in the final lines (see below) when calling av_buffer_unref( &_hwDeviceRefCtx ) and implicitely also during avcodec_free_context( &_pCodecCtx ), but both calls are required for shutting down.

The (in this case relevant) valgrind-output is

Invalid read of size 8
   at 0x48AD987: UnknownInlinedFun (buffer.c:121)
   by 0x48AD987: UnknownInlinedFun (buffer.c:144)
   by 0x48AD987: av_buffer_unref (buffer.c:139)
   by 0x5D06D7A: avcodec_close (avcodec.c:486)
   by 0x628DD7D: avcodec_free_context (options.c:175)
   by 0x10A863: main (main.cpp:115)
 Address 0x17812700 is 0 bytes inside a block of size 24 free'd
   at 0x484488F: free (vg_replace_malloc.c:985)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:127)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:144)
   by 0x48AD98F: av_buffer_unref (buffer.c:139)
   by 0x48BE098: hwframe_ctx_free (hwcontext.c:240)
   by 0x48AD9A6: UnknownInlinedFun (buffer.c:133)
   by 0x48AD9A6: UnknownInlinedFun (buffer.c:144)
   by 0x48AD9A6: av_buffer_unref (buffer.c:139)
   by 0x5D06D0A: UnknownInlinedFun (decode.c:1261)
   by 0x5D06D0A: avcodec_close (avcodec.c:465)
   by 0x628DD7D: avcodec_free_context (options.c:175)
   by 0x10A863: main (main.cpp:115)
 Block was alloc'd at
   at 0x4849366: posix_memalign (vg_replace_malloc.c:2099)
   by 0x48D9BD5: av_malloc (mem.c:105)
   by 0x48D9DAD: av_mallocz (mem.c:256)
   by 0x48AD8DD: UnknownInlinedFun (buffer.c:44)
   by 0x48AD8DD: av_buffer_create (buffer.c:64)
   by 0x48BDDEB: av_hwdevice_ctx_alloc (hwcontext.c:179)
   by 0x48BDF29: av_hwdevice_ctx_create (hwcontext.c:622)
   by 0x10A482: main (main.cpp:43)

Invalid free() / delete / delete[] / realloc()
   at 0x484488F: free (vg_replace_malloc.c:985)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:127)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:144)
   by 0x48AD98F: av_buffer_unref (buffer.c:139)
   by 0x5D06D7A: avcodec_close (avcodec.c:486)
   by 0x628DD7D: avcodec_free_context (options.c:175)
   by 0x10A863: main (main.cpp:115)
 Address 0x17812700 is 0 bytes inside a block of size 24 free'd
   at 0x484488F: free (vg_replace_malloc.c:985)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:127)
   by 0x48AD98F: UnknownInlinedFun (buffer.c:144)
   by 0x48AD98F: av_buffer_unref (buffer.c:139)
   by 0x48BE098: hwframe_ctx_free (hwcontext.c:240)
   by 0x48AD9A6: UnknownInlinedFun (buffer.c:133)
   by 0x48AD9A6: UnknownInlinedFun (buffer.c:144)
   by 0x48AD9A6: av_buffer_unref (buffer.c:139)
   by 0x5D06D0A: UnknownInlinedFun (decode.c:1261)
   by 0x5D06D0A: avcodec_close (avcodec.c:465)
   by 0x628DD7D: avcodec_free_context (options.c:175)
   by 0x10A863: main (main.cpp:115)
 Block was alloc'd at
   at 0x4849366: posix_memalign (vg_replace_malloc.c:2099)
   by 0x48D9BD5: av_malloc (mem.c:105)
   by 0x48D9DAD: av_mallocz (mem.c:256)
   by 0x48AD8DD: UnknownInlinedFun (buffer.c:44)
   by 0x48AD8DD: av_buffer_create (buffer.c:64)
   by 0x48BDDEB: av_hwdevice_ctx_alloc (hwcontext.c:179)
   by 0x48BDF29: av_hwdevice_ctx_create (hwcontext.c:622)
   by 0x10A482: main (main.cpp:43)

that is also duplicated (due to the calls to avcodec_free_context() and av_buffer_unref()).

The question is: How can I fix this?

The (more or less) minimal (not) working example reads

#include <string>

extern "C" {
  #include <libavutil/opt.h>
  #include <libavcodec/avcodec.h>
  #include <libavformat/avformat.h>
  #include <libavutil/hwcontext.h>
  #include <libavutil/pixdesc.h>
  #include <libavutil/hwcontext_cuda.h>
}

//(former) libx264 encoding based on https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/muxing.c
//update to h264_nvenc with a lot of help from https://stackoverflow.com/questions/49862610/opengl-to-ffmpeg-encode
//and some additional info of https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/vaapi_encode.c

int main() {
    const int _SrcImageWidth=640;
    const int _SrcImageHeight=480;
    
    const AVOutputFormat *_oFmt = nullptr;
    AVFormatContext *_oFmtCtx = nullptr;
    
    const AVCodec *_pCodec = nullptr;
    AVCodecContext *_pCodecCtx = nullptr;
    
    AVFrame* _frame;
    AVPacket* _packet;
    AVStream* _stream;
    
    AVBufferRef *_hwDeviceRefCtx = nullptr;
    const CUcontext* _cudaCtx;
    
    const std::string _OutFileName = "output.mkv";
    
    //constructor part
    int ret;

    //output format context      
    avformat_alloc_output_context2( &_oFmtCtx, nullptr, nullptr, _OutFileName.c_str() );
    _oFmt = _oFmtCtx->oformat;

    //hardware format context
    ret = av_hwdevice_ctx_create( &_hwDeviceRefCtx, AV_HWDEVICE_TYPE_CUDA, "NVIDIA GeForce RTX 4070", nullptr, 0 );

    //hardware frame context for device buffer allocation
    AVBufferRef* hwFrameRefCtx = av_hwframe_ctx_alloc( _hwDeviceRefCtx );
    AVHWFramesContext* hwFrameCtx = (AVHWFramesContext*) (hwFrameRefCtx->data);
    hwFrameCtx->width = _SrcImageWidth;
    hwFrameCtx->height = _SrcImageHeight;
    hwFrameCtx->sw_format = AV_PIX_FMT_0BGR32;
    hwFrameCtx->format = AV_PIX_FMT_CUDA;
    hwFrameCtx->device_ref = _hwDeviceRefCtx;
    hwFrameCtx->device_ctx = (AVHWDeviceContext*) _hwDeviceRefCtx->data;

    ret = av_hwframe_ctx_init( hwFrameRefCtx );

    //get cuda context
    const AVHWDeviceContext* hwDeviceCtx = (AVHWDeviceContext*)(_hwDeviceRefCtx->data);
    const AVCUDADeviceContext* cudaDeviceCtx = (AVCUDADeviceContext*)(hwDeviceCtx->hwctx);
    _cudaCtx = &(cudaDeviceCtx->cuda_ctx);

    //codec context
    _pCodec = avcodec_find_encoder_by_name( "h264_nvenc" );

    _packet = av_packet_alloc();

    _stream = avformat_new_stream( _oFmtCtx, nullptr );
    _stream->id = _oFmtCtx->nb_streams - 1;
    _pCodecCtx = avcodec_alloc_context3( _pCodec );

    _pCodecCtx->qmin = 18;
    _pCodecCtx->qmax = 20;
    _pCodecCtx->width = _SrcImageWidth;
    _pCodecCtx->height = _SrcImageHeight;
    _pCodecCtx->framerate = (AVRational) {25,1};
    _pCodecCtx->time_base = (AVRational) {1,25};
    _stream->time_base = _pCodecCtx->time_base;
    _pCodecCtx->gop_size = 12; //I-Frame every at most 12 frames
    _pCodecCtx->max_b_frames = 2;
    _pCodecCtx->pix_fmt = AV_PIX_FMT_CUDA; //required to use renderbuffer as src
    _pCodecCtx->codec_type = AVMEDIA_TYPE_VIDEO;
    _pCodecCtx->sw_pix_fmt = AV_PIX_FMT_0BGR32; 
    _pCodecCtx->hw_device_ctx = _hwDeviceRefCtx;
    _pCodecCtx->hw_frames_ctx = av_buffer_ref( hwFrameRefCtx );
    av_opt_set(_pCodecCtx->priv_data, "preset", "p7", 0);
    av_opt_set(_pCodecCtx->priv_data, "rc", "vbr", 0);
    if( _oFmtCtx->oformat->flags & AVFMT_GLOBALHEADER ) {
        _pCodecCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
    }

    ret = avcodec_open2( _pCodecCtx, _pCodec, nullptr );
    avcodec_parameters_from_context( _stream->codecpar, _pCodecCtx );

    if (!(_oFmtCtx->oformat->flags & AVFMT_NOFILE)) {
        ret = avio_open(&_oFmtCtx->pb, _OutFileName.c_str(), AVIO_FLAG_WRITE);
    }
    ret = avformat_write_header( _oFmtCtx, nullptr );

    //use hardware frame from above
    _frame = av_frame_alloc();
    ret = av_hwframe_get_buffer( _pCodecCtx->hw_frames_ctx, _frame, 0 );
    _frame->pts = 1;

    av_buffer_unref( &hwFrameRefCtx );

    //destructor part
    av_frame_free( &_frame );
    av_packet_free( &_packet );

    av_write_trailer( _oFmtCtx );
    avio_closep( &_oFmtCtx->pb );

    avformat_free_context( _oFmtCtx );

    avcodec_free_context( &_pCodecCtx );
    av_buffer_unref( &_hwDeviceRefCtx );

    return 0;
}

and compiles with (linux user)

g++ -lavutil -lavformat -lavcodec -lz -lavutil -lswscale -lswresample -lm -ggdb3 -I/opt/cuda/include main.cpp

Thanks in advance!


Solution

  • The answer is a kind of RTFM moment: in the doxygen documentation of AVCodecContext (link) it's stated regarding the hw_device_ctx-member

    This should be used if either the codec device does not require hardware frames or any that are used are to be allocated internally by libavcodec. If the user wishes to supply any of the frames used as encoder input or decoder output then hw_frames_ctx should be used instead. When hw_frames_ctx is set in get_format() for a decoder, this field will be ignored while decoding the associated stream segment, but may again be used on a following one after another get_format() call.

    For both encoders and decoders this field should be set before avcodec_open2() is called and must not be written to thereafter.

    Note that some decoders may require this field to be set initially in order to support hw_frames_ctx at all - in that case, all frames contexts used must be created on the same device.

    Unfortunately I paid more attention to the third and less to the the first paragraph; removing the line _pCodecCtx->hw_device_ctx = _hwDeviceRefCtx; fixed the issue.