videoh.264nvidiavideo-encodingnvenc

How to stream H.264 video over UDP using the NVidia NVEnc hardware encoder?


This is going to be a self-answered question, because it has driven me nuts over the course of a full week and I wish to spare fellow programmers the frustration I went through.

The situation is this: you wish to use NVidia's NVEnc hardware encoder (available on Kepler and Maxwell cards, i.e. GT(x) 7xx and GT(x) 9xx, respectively) to stream the output of your graphics application via UDP. This is not a trivial path to take, but it can be very efficient as it circumvents the need to "download" frames from video memory to system memory until after the encoding stage, because NVEnc has the ability to access video memory directly.

I had already managed to make this work insofar as to generate a .h264 file by simply writing NVEnc's output buffers to it, frame after frame. VLC had no trouble playing such a file, except that the timing was off (I didn't try to fix this, as I only needed that file for debugging purposes).

The problem came when I tried to stream the encoded frames via UDP: neither VLC nor MPlayer were able to render the video. It turned out there were two reasons for that, which I'll explain in my answer.


Solution

  • Like I said in the question, there were two (well, actually three) reasons MPlayer couldn't play my UDP stream.

    The first reason has to do with packetizing. NVEnc fills its output buffers with data blocks called NALUs, which it separates with "start codes" mainly intended for bitstream synchronization. (Go to szatmary's excellent SO answer if you wish to learn more about Annex B - and its competitor AVCC).

    The problem now is that NVEnc sometimes delivers more than one such NALU in a single output buffer. Although most NALUs contain encoded video frames, it is sometimes necessary (and mandatory at the beginning of a stream) to send some metadata as well, like the resolution, framerate etc.. NVEnc helps with that by generating those special NALUs as well (more on that further down).

    As it turns out, player software however does not support getting more than one NALU in a single UDP packet. This means that you have to program a simple loop that looks for start codes (two or three "0" bytes followed by a "1" byte) to chop up the output buffer and send each NALU in its own UDP packet. (Note however that the UDP packets must still include those start codes.)

    Another problem with packetization is that IP packets quite generally cannot exceed a certain size. Again, a SO answer provides valuable insight into what those limits are in various contexts. The important thing here is that while you do not have to handle this yourself, you do have to tell NVEnc to "slice" its output, by setting the following parameters when creating the encoder object:

    m_stEncodeConfig.encodeCodecConfig.h264Config.sliceMode = 1;
    m_stEncodeConfig.encodeCodecConfig.h264Config.sliceModeData = 1500 - 28;
    

    (with m_stEncodeConfig being the parameter struct that will be passed to NvEncInitializeEncoder(), 1500 being the MTU of Ethernet packets, and 28 being the added sizes of an IP4 header and a UDP header).

    The second reason why MPlayer couldn't play my stream has to do with the nature of streaming video as opposed to storing it in a file. When player software starts playing a H.264 file, it will find the required metadata NALUs containing the resolution, framerate etc., store that info and thus never need it again. Whereas when asked to play a stream, it will have missed the beginning of that stream and cannot begin to play until the sender re-sends the metadata.

    And here's the problem: unless told otherwise, NVEnc will only ever generate the metadata NALUs at the very beginning of an encoding session. Here is the encoder configuration parameter that needs to be set:

    m_stEncodeConfig.encodeCodecConfig.h264Config.repeatSPSPPS = 1;
    

    This tells NVEnc to re-generate SPS/PPS NALUs from time to time (I think that by default, this means with every IDR frame).

    And voilĂ ! With these hurdles cleared, you will be able to appreciate the power of generating compressed video streams while hardly taxing the CPU at all.

    EDIT: I realize that this kind of ultra-simple UDP streaming is discouraged, as it does not really conform to any standard. Mplayer will play such a stream, but VLC, which is otherwise capable of playing almost anything, will not. The foremost reason is that there is nothing in the data stream that even indicates the type of the medium being sent (in this case, video). I am currently doing research to find the simplest way that will satisfy accepted standards.