I am trying to combine the following:
(a) : 29s video clip that has its own audio that lasts the entire duration
(b) : audio clip I want to play at the start of the video, in conjunction with original audio, and is ~2 seconds long
I successfully use 'amix' to obtain a video at the end with combined audio, but the problem is that the final video's audio cuts off at around 26 out of the 29 seconds of the video and goes silent.
What doesn't make any sense is that the resulting video plays as it should, with the audio successfully mixed. But the output video's audio stream loses the last 3 seconds.
Here's the 'amix' command I'm using (sending via subprocess):
subprocess.call(['ffmpeg','-i', input.mp4', '-i', "audioclip.mp3", '-filter_complex', 'amix', output.mp4'])
I've also used versions of this command that spell out the -map "0:a" and -map "1:a", or tried using 'amix=inputs=2:duration:longest' among many other additions. All lead to the same problem: the final combined video's audio drops out with 3 seconds remaining in the video, even though the initial 'input.mp4' video has a full 29 out of 29 seconds of audio.
Does anyone know why these last several seconds of audio from [a] are missing in the final video?
_________________________________________________________________
edit: Below is my output when I run the amix command listed above:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advanced.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.20.101
Duration: 00:00:29.77, start: 0.000000, bitrate: 5441 kb/s
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5304 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : Bento4 Video Handler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : Bento4 Sound Handler
vendor_id : [0][0][0][0]
[mp3 @ 000001f0c8ec2040] Estimating duration from bitrate, this may be inaccurate
Input #1, mp3, from 'TTS_clip.mp3':
Duration: 00:00:01.90, start: 0.000000, bitrate: 32 kb/s
Stream #1:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
Stream mapping:
Stream #0:1 (aac) -> amix (graph 0)
Stream #1:0 (mp3float) -> amix (graph 0)
amix:default (graph 0) -> Stream #0:0 (aac)
Stream #0:0 -> #0:1 (h264 (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 000001f0c8cbe5c0] using SAR=1/1
[libx264 @ 000001f0c8cbe5c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 000001f0c8cbe5c0] profile High, level 4.0, 4:2:0, 8-bit
[libx264 @ 000001f0c8cbe5c0] 264 - core 164 r3094 bfc87b7 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=24 lookahead_threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'RuneBearinstakill_advancedwithtts.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.20.101
Stream #0:0: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc59.25.100 aac
Stream #0:1(eng): Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 30 fps, 15360 tbn (default)
Metadata:
handler_name : Bento4 Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc59.25.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 893 fps=110 q=-1.0 Lsize= 18717kB time=00:00:29.66 bitrate=5168.5kbits/s speed=3.66x
video:18256kB audio:433kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.150179%
[aac @ 000001f0c8f9ebc0] Qavg: 921.259
[libx264 @ 000001f0c8cbe5c0] frame I:4 Avg QP:21.33 size: 71366
[libx264 @ 000001f0c8cbe5c0] frame P:633 Avg QP:23.32 size: 23837
[libx264 @ 000001f0c8cbe5c0] frame B:256 Avg QP:25.22 size: 12968
[libx264 @ 000001f0c8cbe5c0] consecutive B-frames: 57.2% 10.3% 10.1% 22.4%
[libx264 @ 000001f0c8cbe5c0] mb I I16..4: 17.9% 71.4% 10.8%
[libx264 @ 000001f0c8cbe5c0] mb P I16..4: 6.9% 17.6% 0.8% P16..4: 43.1% 6.5% 1.5% 0.0% 0.0% skip:23.6%
[libx264 @ 000001f0c8cbe5c0] mb B I16..4: 1.5% 4.2% 0.3% B16..8: 39.7% 4.6% 0.5% direct: 1.6% skip:47.6% L0:55.9% L1:41.8% BI: 2.3%
[libx264 @ 000001f0c8cbe5c0] 8x8 transform intra:69.5% inter:87.3%
[libx264 @ 000001f0c8cbe5c0] coded y,uvDC,uvAC intra: 35.6% 26.8% 0.8% inter: 13.4% 10.8% 0.0%
[libx264 @ 000001f0c8cbe5c0] i16 v,h,dc,p: 21% 37% 12% 30%
[libx264 @ 000001f0c8cbe5c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 26% 21% 4% 5% 5% 6% 4% 5%
[libx264 @ 000001f0c8cbe5c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 28% 15% 5% 7% 7% 7% 5% 4%
[libx264 @ 000001f0c8cbe5c0] i8c dc,h,v,p: 67% 18% 14% 1%
[libx264 @ 000001f0c8cbe5c0] Weighted P-Frames: Y:0.2% UV:0.0%
[libx264 @ 000001f0c8cbe5c0] ref P L0: 72.3% 15.4% 8.7% 3.6% 0.0%
[libx264 @ 000001f0c8cbe5c0] ref B L0: 88.9% 9.5% 1.6%
[libx264 @ 000001f0c8cbe5c0] ref B L1: 97.7% 2.3%
[libx264 @ 000001f0c8cbe5c0] kb/s:5024.13
And here is the output when I check the stream durations for the input video and the output video, showing how the output video's audio stream is somehow reduced by several seconds after the amix:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advanced.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.20.101
Duration: 00:00:29.77, start: 0.000000, bitrate: 5403 kb/s
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5266 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : Bento4 Video Handler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : Bento4 Sound Handler
vendor_id : [0][0][0][0]
[STREAM]
duration=29.766667
[/STREAM]
[STREAM]
duration=29.738000
[/STREAM]
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'RuneBearinstakill_advancedwithtts.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.20.101
Duration: 00:00:29.77, start: 0.000000, bitrate: 5098 kb/s
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt470bg/bt470bg/smpte170m, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 4971 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : Bento4 Video Handler
vendor_id : [0][0][0][0]
[STREAM]
duration=27.477000
[/STREAM]
[STREAM]
duration=29.766667
I found the fix. It turned out I needed to set the input video's audio stream to aresample=1=async in the filter_complex for the amix command.
aresample=aysnc=1
Ultimately my amix command looked like this:
'[0:a]aresample=async=1[0a];[1:a]volume=2.0[1a];[0a][1a]amix=inputs=2'
I found this kind of solution from a similar question over at superuser : https://superuser.com/questions/1234493/ffmpeg-amix-audio-to-video-with-some-audio-in-parts