audiovideocontainersmp4fmp4

How many video and audio samples should be included in each mdat?


I am creating a FMP4 with 2 tracks (one for video and one for audio). I trying to find out how many video samples should I include in the mdat and how many audio as well.

So my FMP4 has the following structure:

ftyp
moov
moof (track1 - video)
mdat (track1 - video)
moof (track2 - audio)
mdat (track2 - audio)
moof (track1 - video)
mdat (track1 - video)
moof (track2 - audio)
mdat (track2 - audio)
...

Should each video mdat have just 1 frame or an entire GOP?

Should each audio mdat have the respective audio samples of the previous video mdat or can I send as many audio samples I want (since audio samples are much smaller, I could send like 2 seconds of audio while the video mdat sent before has only 1 second of duration).

PS: I thought in sending the entire GOP on each video mdat but I noticed that when I recode that fmp4 using ffmpeg, it makes so that the mdat has only 1 frame. I can do that (have just 1 video frame on each mdat), but then I am lost on how many audio samples should I send. If I send only the respective audio samples for that 1 video frame, the audio doesn't play very well.

Thanks!


Solution

  • From several empirical testing using ffmpeg, it appears that it likes to group 0.5s of video frames followed by 0.5s of audio frames and that appears to work really well.