I have a video with some background music in it.
I wish to add a piece of spoken dialogue at a particular location in the video, such that the background music is lowered for the entire duration of the dialogue audio.
I found a similar solution using sidechaincompress
, which just works for mp3. I made some changes to it so that it includes the video too (-map 0:v
). However, now the audio is cut short as soon as the dialogue ends.
ffmpeg -i video-with-bg-music.mp4 -i dialogue.mp3 -c:v libx264 -filter_complex "[1:a]asplit=2[sc][mix];[0:a][sc]sidechaincompress=threshold=0.003:ratio=20[bg];[bg][mix]amerge[final]" -map 0:v -map [final] final.mp4
I am not a pro at using ffmpeg and probably don't know what's going on with the filter_complex
. Please help me out.
EDIT:
Output log for solution suggested by @kesh
ffmpeg -i video_bg.mp4 -i dialogues.mp3 -filter_complex "[1:a]adelay=0,apad,asplit=2[sc][mix];[0:a][sc]sidechaincompress=threshold=0.003:ratio=20[bg];[bg][mix]amix=duration=shortest[out]" -map 0:v -map [out] video_bg_speech.mp4
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video_bg.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:01:50.12, start: 0.000000, bitrate: 157 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 720x1280 [SAR 1:1 DAR 9:16], 146 kb/s, 24 fps, 24 tbr, 12288 tbn, 48 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
[mp3 @ 0x5577fa3b6740] Estimating duration from bitrate, this may be inaccurate
Input #1, mp3, from 'dialogue.mp3':
Duration: 00:00:17.33, start: 0.000000, bitrate: 32 kb/s
Stream #1:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
File 'video_bg_speech.mp4' already exists. Overwrite ? [y/N] y
Stream mapping:
Stream #0:1 (aac) -> sidechaincompress:main (graph 0)
Stream #1:0 (mp3float) -> adelay (graph 0)
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
amix (graph 0) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[libx264 @ 0x5577fa446040] using SAR=1/1
[libx264 @ 0x5577fa446040] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5577fa446040] profile High, level 3.1
[libx264 @ 0x5577fa446040] 264 - core 155 r2917 0a84d98 - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=18 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=24 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
[Parsed_sidechaincompress_3 @ 0x5577faed3f00] No channel layout for input 1
Output #0, mp4, to 'video_bg_speech.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 720x1280 [SAR 1:1 DAR 9:16], q=-1--1, 24 fps, 12288 tbn, 24 tbc (default)
Metadata:
handler_name : VideoHandler
encoder : Lavc58.54.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 24000 Hz, mono, fltp, 69 kb/s (default)
Metadata:
encoder : Lavc58.54.100 aac
frame= 480 fps=325 q=-1.0 Lsize= 517kB time=00:00:19.87 bitrate= 213.0kbits/s speed=13.5x
video:353kB audio:152kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.388190%
[libx264 @ 0x5577fa446040] frame I:2 Avg QP:15.99 size:136135
[libx264 @ 0x5577fa446040] frame P:121 Avg QP:16.86 size: 603
[libx264 @ 0x5577fa446040] frame B:357 Avg QP:26.34 size: 43
[libx264 @ 0x5577fa446040] consecutive B-frames: 0.6% 0.4% 0.6% 98.3%
[libx264 @ 0x5577fa446040] mb I I16..4: 2.3% 65.7% 32.1%
[libx264 @ 0x5577fa446040] mb P I16..4: 0.0% 0.1% 0.1% P16..4: 1.1% 0.1% 0.1% 0.0% 0.0% skip:98.5%
[libx264 @ 0x5577fa446040] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.2% 0.0% 0.0% direct: 0.0% skip:99.8% L0:55.5% L1:44.5% BI: 0.0%
[libx264 @ 0x5577fa446040] 8x8 transform intra:62.5% inter:56.6%
[libx264 @ 0x5577fa446040] coded y,uvDC,uvAC intra: 89.6% 89.0% 78.7% inter: 0.0% 0.1% 0.0%
[libx264 @ 0x5577fa446040] i16 v,h,dc,p: 37% 5% 8% 50%
[libx264 @ 0x5577fa446040] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 16% 6% 6% 9% 9% 9% 10% 9%
[libx264 @ 0x5577fa446040] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 18% 7% 6% 11% 9% 9% 7% 7%
[libx264 @ 0x5577fa446040] i8c dc,h,v,p: 42% 22% 20% 15%
[libx264 @ 0x5577fa446040] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x5577fa446040] ref P L0: 46.3% 2.9% 19.3% 31.5%
[libx264 @ 0x5577fa446040] ref B L0: 45.6% 54.0% 0.4%
[libx264 @ 0x5577fa446040] ref B L1: 97.0% 3.0%
[libx264 @ 0x5577fa446040] kb/s:144.26
[aac @ 0x5577fa3b9940] Qavg: 307.060
Try this (the short clip inserted at 3-second mark):
[1:a]aresample=44100,adelay=3000,apad,asplit=2[sc][mix];\
[0:a][sc]sidechaincompress=threshold=0.003:ratio=20[bg];\
[bg][mix]amix=duration=shortest[out]
aresample
to match the dialog's sampling rate to the video's.adelay
places the clip at the right position (note the delay is in millisec)apad
extends the short stream indefinitely with silenceamix
combines 2 streams and stops when the video audio endsAddendum:
If mp4 has no audio, try:
ffmpeg -i video-with-bg-music.mp4 -i dialogue.mp3 \
-af [1:a]adelay=3000,apad \
-shortest final.mp4
apad
in the audio filtergraph extends the audio stream, and -shortest
output option cuts it off when the video stream ends.
Debug Sample
ffmpeg -filter_complex \
[1:a]aresample=44100,apad,asplit=2[sc][mix];\
[0:a][sc]sidechaincompress=threshold=0.003:ratio=20[bg];\
[bg][mix]amix=duration=shortest[out] \
-f lavfi -i anoisesrc=d=60:c=pink:r=44100:a=0.5 \
-f lavfi -i sine=220:4:d=5 \
-map [out] output.mp3