When producing H.264 frames and decoding them using pyAV, packets are parsed from frames only when invoking the parse
methods twice.
Consider the following test H.264 input, created using:
ffmpeg -f lavfi -i testsrc=duration=10:size=1280x720:rate=30 -f image2 -vcodec libx264 -bsf h264_mp4toannexb -force_key_frames source -x264-params keyint=1:scenecut=0 "frame-%4d.h264"
Now, using pyAV to parse the first frame:
import av
codec = av.CodecContext.create('h264', 'r')
with open('/path/to/frame-0001.h264', 'rb') as file_handler:
chunk = file_handler.read()
packets = codec.parse(chunk) # This line needs to be invoked twice to parse packets
packets remain empty unless the last line is invoked again (packets = codec.parse(chunk)
)
Also, for different real life examples I cannot characterize, it seems that decoding frames from packets also require several decode invocations:
packet = packets[0]
frames = codec.decode(packet) # This line needs to be invoked 2-3 times to actually receive frames.
Does anyone know anything about this incosistent behavior of pyAV?
(Using Python 3.8.12 on macOS Monterey 12.3.1, ffmpeg 4.4.1, pyAV 9.0.2)
This is an expected PyAV behavior. Not only, it is an expected behavior of the underlying libav
. One packet does not guarantee a frame, and multiple packets may be needed before producing a frame. This is apparent in FFmpeg's video decoder example:
while (ret >= 0) {
ret = avcodec_receive_frame(dec_ctx, frame);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
return;
If it needs more packets to form a frame, it throws the EAGAIN
error.
[edit]
Actually, the above example is not a good example as it just exits on EAGAIN
. To retrieve a frame, it should rather continue
on EAGAIN
:
while (ret >= 0) {
ret = avcodec_receive_frame(dec_ctx, frame);
if (AVERROR(EAGAIN))
continue;
if (ret == AVERROR_EOF)
return;
[edit]
pyav's codec.parse()
The decoding sometimes needing additional calls is a fairly well-known fact, but the parser needing to flush is less common. Here is the difference between PyAV and FFmpeg:
PyAV parses the input data with av_parser_parse2()
like this [ref]:
while True:
with nogil:
consumed = lib.av_parser_parse2(
self.parser,
self.ptr,
&out_data, &out_size,
in_data, in_size,
lib.AV_NOPTS_VALUE, lib.AV_NOPTS_VALUE,
0
)
err_check(consumed)
# ...snip...
if not in_size:
# This was a flush. Only one packet should ever be returned.
break
in_data += consumed
in_size -= consumed
if not in_size:
# Aaaand now we're done.
break
So it reads until the input data is 100% consumed and note that it does not call av_parser_parse2
at end of buffer (which makes sense as the input data may be only a part of the stream data.
In contrast, FFmpeg does not call av_parser_parse2
directly and uses parse_packet
and you can see how it handles the similar situation:
while (size > 0 || (flush && got_output)) {
int64_t next_pts = pkt->pts;
int64_t next_dts = pkt->dts;
int len;
len = av_parser_parse2(sti->parser, sti->avctx,
&out_pkt->data, &out_pkt->size, data, size,
pkt->pts, pkt->dts, pkt->pos);
It calls av_parser_parse2
also to flush the stream after input data stream is exhausted. So, you need to do the same in PyAV: after all your frames are read, call codec.parse()
one last time to flush the last packet.