python-3.xffmpegh.264pyav

PyAV inconsistency when parsing packets from h264 frames


When producing H.264 frames and decoding them using pyAV, packets are parsed from frames only when invoking the parse methods twice.

Consider the following test H.264 input, created using:

ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -f image2 -vcodec libx264 -bsf h264_mp4toannexb -force_key_frames source -x264-params keyint=1:scenecut=0 "frame-%4d.h264"

Now, using pyAV to parse the first frame:

import av
codec = av.CodecContext.create('h264', 'r')
with open('/path/to/frame-0001.h264', 'rb') as file_handler:
    chunk = file_handler.read()
    packets = codec.parse(chunk) # This line needs to be invoked twice to parse packets

packets remain empty unless the last line is invoked again (packets = codec.parse(chunk))

Also, for different real life examples I cannot characterize, it seems that decoding frames from packets also require several decode invocations:

packet = packets[0]
frames = codec.decode(packet) # This line needs to be invoked 2-3 times to actually receive frames.

Does anyone know anything about this incosistent behavior of pyAV?

(Using Python 3.8.12 on macOS Monterey 12.3.1, ffmpeg 4.4.1, pyAV 9.0.2)


Solution

  • This is an expected PyAV behavior. Not only, it is an expected behavior of the underlying libav. One packet does not guarantee a frame, and multiple packets may be needed before producing a frame. This is apparent in FFmpeg's video decoder example:

        while (ret >= 0) {
            ret = avcodec_receive_frame(dec_ctx, frame);
            if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                return;
    

    If it needs more packets to form a frame, it throws the EAGAIN error.

    [edit]

    Actually, the above example is not a good example as it just exits on EAGAIN. To retrieve a frame, it should rather continue on EAGAIN:

        while (ret >= 0) {
            ret = avcodec_receive_frame(dec_ctx, frame);
            if (AVERROR(EAGAIN))
                continue;
            if (ret == AVERROR_EOF)
                return;
    

    [edit]

    pyav's codec.parse()

    The decoding sometimes needing additional calls is a fairly well-known fact, but the parser needing to flush is less common. Here is the difference between PyAV and FFmpeg:

    PyAV parses the input data with av_parser_parse2() like this [ref]:

    
            while True:
    
                with nogil:
                    consumed = lib.av_parser_parse2(
                        self.parser,
                        self.ptr,
                        &out_data, &out_size,
                        in_data, in_size,
                        lib.AV_NOPTS_VALUE, lib.AV_NOPTS_VALUE,
                        0
                    )
                err_check(consumed)
    
                # ...snip...
    
                if not in_size:
                    # This was a flush. Only one packet should ever be returned.
                    break
    
                in_data += consumed
                in_size -= consumed
    
                if not in_size:
                    # Aaaand now we're done.
                    break
    
    

    So it reads until the input data is 100% consumed and note that it does not call av_parser_parse2 at end of buffer (which makes sense as the input data may be only a part of the stream data.

    In contrast, FFmpeg does not call av_parser_parse2 directly and uses parse_packet and you can see how it handles the similar situation:

    while (size > 0 || (flush && got_output)) {
       int64_t next_pts = pkt->pts;
       int64_t next_dts = pkt->dts;
       int len;
    
       len = av_parser_parse2(sti->parser, sti->avctx,
                              &out_pkt->data, &out_pkt->size, data, size,
                              pkt->pts, pkt->dts, pkt->pos);
    

    It calls av_parser_parse2 also to flush the stream after input data stream is exhausted. So, you need to do the same in PyAV: after all your frames are read, call codec.parse() one last time to flush the last packet.