videostreamingh.264rtp

How are H.264 real time streams actually compressed and transferred?


This is more of a conceptual question than a technical one. My understanding of H.264 is that it relies on past and future frames in order to compress video data. Its trivial to take a fully compressed H.264 video file and stream it via RTP or any other protocol of your choice, however, how would this work with real time video? In real time video you only have access to past and current frames and don't know the full length of the video, so how can the H.264 codec actually compress the video and prepare it to be an RTP payload? Does it simply buffer and chunk the video into an arbitrarily sized smaller video and compress that? The only way I can think of this working is to split the video into something like 1 second chunks, compress those as individual videos, and make them the RTP payload. Is this how its done or is there more "magic" happening than I suspect?


Solution

  • First, there are three types of frames.

    I (Intra) frames, or keyframes. These frame do not reference any other frames. They are standalone, and can be decoded without any other frame data. Like a JPEG.

    P (Predecitve) frame. Can reference frames from the past.

    B (bi directional) Can reference frames from the past, or the future.

    Option 1. Only use I and P frames. This causes the file to be about 10 - 15% larger (or 10-15% lower quality at the same file size). This is used for interactive systems like video conferencing and screen sharing where latency is very noticeable.

    Option 2, wait for the future to happen. at 30 frames per second the future will be here in 33 milliseconds.

    h.264 specifically can only reference up to 16 neighboring frames. However most people limit this to around 4. So to wait for 4 frames is only about 133 millisecond delay.