openglffmpegencodeh.264

OpenGL to FFMpeg encode


I have a opengl buffer that I need to forward directly to ffmpeg to do the nvenc based h264 encoding.

My current way of doing this is glReadPixels to get the pixels out of the frame buffer and then passing that pointer into ffmpeg such that it can encode the frame into H264 packets for RTSP. However, this is bad because I have to copy bytes out of the GPU ram into CPU ram, to only copy them back into the GPU for encoding.


Solution

  • If you look at the date of posting versus the date of this answer you'll notice I spent much time working on this. (It was my full time job the past 4 weeks).

    Since I had such a difficult time getting this to work I will write up a short guide to hopefully help out whomever finds this.

    Outline

    The basic flow I have is OGL Frame buffer object color attachement (texture) → nvenc (nvidia encoder)

    Things to note

    Some things to note:
    1) The nvidia encoder can accept YUV or RGB type images.
    2) FFMPEG 4.0 and under cannot pass RGB images to nvenc.
    3) FFMPEG was updated to accept RGB as input, per my issues.

    There are a couple different things to know about:
    1) AVHWDeviceContext- Think of this as ffmpegs device abstraction layer.
    2) AVHWFramesContext- Think of this as ffmpegs hardware frame abstraction layer.
    3) cuMemcpy2D- The required method to copy a cuda mapped OGL texture into a cuda buffer created by ffmpeg.

    Comprehensiveness

    This guide is in addition to standard software encoding guidelines. This is NOT complete code, and should only be used in addition to the standard flow.

    Code details

    Setup

    You will need to first get your gpu name, to do this I found some code (I cannot remember where I got it from) that made some cuda calls and got the GPU name:

    int getDeviceName(std::string& gpuName)
    {
    //Setup the cuda context for hardware encoding with ffmpeg
    NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_IYUV;
    int iGpu = 0;
    CUresult res;
    ck(cuInit(0));
    int nGpu = 0;
    ck(cuDeviceGetCount(&nGpu));
    if (iGpu < 0 || iGpu >= nGpu)
    {
        std::cout << "GPU ordinal out of range. Should be within [" << 0 << ", " 
    << nGpu - 1 << "]" << std::endl;
        return 1;
    }
    CUdevice cuDevice = 0;
    ck(cuDeviceGet(&cuDevice, iGpu));
    char szDeviceName[80];
    ck(cuDeviceGetName(szDeviceName, sizeof(szDeviceName), cuDevice));
    gpuName = szDeviceName;
    epLog::msg(epMSG_STATUS, "epVideoEncode:H264Encoder", "...using device \"%s\"", szDeviceName);
    
    return 0;
    }
    

    Next you will need to setup your hwdevice and hwframe contexts:

        getDeviceName(gpuName);
        ret = av_hwdevice_ctx_create(&m_avBufferRefDevice, AV_HWDEVICE_TYPE_CUDA, gpuName.c_str(), NULL, NULL);
        if (ret < 0) 
        {
            return -1;
        }
    
        //Example of casts needed to get down to the cuda context
        AVHWDeviceContext* hwDevContext = (AVHWDeviceContext*)(m_avBufferRefDevice->data);
        AVCUDADeviceContext* cudaDevCtx = (AVCUDADeviceContext*)(hwDevContext->hwctx);
        m_cuContext = &(cudaDevCtx->cuda_ctx);
    
        //Create the hwframe_context
        //  This is an abstraction of a cuda buffer for us. This enables us to, with one call, setup the cuda buffer and ready it for input
        m_avBufferRefFrame = av_hwframe_ctx_alloc(m_avBufferRefDevice);
    
        //Setup some values before initialization 
        AVHWFramesContext* frameCtxPtr = (AVHWFramesContext*)(m_avBufferRefFrame->data);
        frameCtxPtr->width = width;
        frameCtxPtr->height = height;
        frameCtxPtr->sw_format = AV_PIX_FMT_0BGR32; // There are only certain supported types here, we need to conform to these types
        frameCtxPtr->format = AV_PIX_FMT_CUDA;
        frameCtxPtr->device_ref = m_avBufferRefDevice;
        frameCtxPtr->device_ctx = (AVHWDeviceContext*)m_avBufferRefDevice->data;
    
        //Initialization - This must be done to actually allocate the cuda buffer. 
        //  NOTE: This call will only work for our input format if the FFMPEG library is >4.0 version..
        ret = av_hwframe_ctx_init(m_avBufferRefFrame);
        if (ret < 0) {
            return -1;
        }
    
        //Cast the OGL texture/buffer to cuda ptr
        CUresult res;
        CUcontext oldCtx;
        m_inputTexture = texture;
        res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
        res = cuCtxPushCurrent(*m_cuContext);
        res = cuGraphicsGLRegisterImage(&cuInpTexRes, m_inputTexture, GL_TEXTURE_2D, CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY);
        res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
    
        //Assign some hardware accel specific data to AvCodecContext 
        c->hw_device_ctx = m_avBufferRefDevice;//This must be done BEFORE avcodec_open2()
        c->pix_fmt = AV_PIX_FMT_CUDA; //Since this is a cuda buffer, although its really opengl with a cuda ptr
        c->hw_frames_ctx = m_avBufferRefFrame;
        c->codec_type = AVMEDIA_TYPE_VIDEO;
        c->sw_pix_fmt = AV_PIX_FMT_0BGR32;
    
        // Setup some cuda stuff for memcpy-ing later
        m_memCpyStruct.srcXInBytes = 0;
        m_memCpyStruct.srcY = 0;
        m_memCpyStruct.srcMemoryType = CUmemorytype::CU_MEMORYTYPE_ARRAY;
    
        m_memCpyStruct.dstXInBytes = 0;
        m_memCpyStruct.dstY = 0;
        m_memCpyStruct.dstMemoryType = CUmemorytype::CU_MEMORYTYPE_DEVICE;
    

    Keep in mind, although there is a lot done above, the code shown is IN ADDITION to the standard software encoding code. Make sure to include all those calls/object initialization as well.

    Unlike the software version, all that is needed for the input AVFrame object is to get the buffer AFTER your alloc call:

    // allocate RGB video frame buffer
        ret = av_hwframe_get_buffer(m_avBufferRefFrame, rgb_frame, 0);  // 0 is for flags, not used at the moment
    

    Notice it takes in the hwframe_context as an argument, this is how it knows what device, size, format, etc to allocate for on the gpu.

    Call to encode each frame

    Now we are setup, and are ready to encode. Before each encode we need to copy the frame from the texture to a cuda buffer. We do this by mapping a cuda array to the texture then copying that array to a cuDeviceptr (which was allocated by the av_hwframe_get_buffer call above):

    //Perform cuda mem copy for input buffer
    CUresult cuRes;
    CUarray mappedArray;
    CUcontext oldCtx;
    
    //Get context
    cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
    cuRes = cuCtxPushCurrent(*m_cuContext);
    
    //Get Texture
    cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
    cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
    
    //Map texture to cuda array
    cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0); // Nvidia says its good practice to remap each iteration as OGL can move things around
    
    //Release texture
    cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
    
    //Setup for memcopy
    m_memCpyStruct.srcArray = mappedArray;
    m_memCpyStruct.dstDevice = (CUdeviceptr)rgb_frame->data[0]; // Make sure to copy devptr as it could change, upon resize
    m_memCpyStruct.dstPitch = rgb_frame->linesize[0];   // Linesize is generated by hwframe_context
    m_memCpyStruct.WidthInBytes = rgb_frame->width * 4; //* 4 needed for each pixel
    m_memCpyStruct.Height = rgb_frame->height;          //Vanilla height for frame
    
    //Do memcpy
    cuRes = cuMemcpy2D(&m_memCpyStruct); 
    
    //release context
    cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
    

    Now we can simply call send_frame and it all works!

            ret = avcodec_send_frame(c, rgb_frame); 
    

    Note: I left most of my code out, since it is not for the public. I may have some details incorrect, this is how I was able to make sense of all the data I gathered over the past month...feel free to correct anything that is incorrect. Also, fun fact, during all this my computer crashed an I lost all my initial investigation (everything I didnt check into source control), which includes all the various example code I found around the internet. So if you see something an its yours, call it out please. This can help others come to the conclusion that I came to.

    Shoutout

    Big shout out to BtbN at https://webchat.freenode.net/ #ffmpeg, I wouldnt have gotten any of this without their help.