I am working on an M4a file with the following metadata:
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A mp42isom
creation_time : 2019-08-14T13:45:39.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000000387C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:05.25, start: 0.047891, bitrate: 69 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 65 kb/s (default)
The audio duration = 5246.2585 ms
I am trying to calculate a number of frames using the following formula:
duration * sampling rate / frame size = 5246.2585 * 44.1/1024 = 225.9375 frames
I tried multiple files and it always gives xxx.9357 frames.
However, using FFprobe:
ffprobe -i audio.m4a -show_streams -hide_banner
I am getting:
nb_frames=228
There is always a 2.0625 difference between my calculations and FFprobe output. Any ideas what I am doing wrong here? How can I accurately calculate the number of frames?
In AAC, there is one packet for every 1024 samples, but each packet affects 2048 samples, and each sample is partly recorded in two packets. Therefore, if you want to properly represent N packets worth of audio samples, you need to use N+1 packets.
If we think of this as each packet affecting the corresponding 1024 samples as well as the next block of samples, then it means that the first 1024 samples cannot be properly represented, so common practice is to pre-pad the signal with zeros in the encoder. On playback these will be discarded, and that's why the duration of the signal is less than you would expect by counting packets.
For some reason, the common practice is actually to pad out with 2112 samples instead of just 1024. The length of padding isn't actually recorded in the AAC file, and isn't specified in the standard, so everybody just uses 2112 to be compatible with everyone else.
2112 samples is exactly 2.0625 packets.
If you want to learn more about this, the magic google words are "AAC priming"