azure-media-services

Speaker Mode Video in Azure Media Services


I have two hi-res local video recording files from a podcast interview.

I would like to merge them into one output file with the speaker showing at all times.

So we'd need to analyse the audio track and see who is speaking (guest has priority) and then create an array of timestamps of the speaker.

Volume analysis example using ffmpeg similar to what I'm describing

Then I'd like to use AMS to merge the video files based on the timestamps (eg. host.mp4 source for 20 seconds then guest.mp4 for 30 seconds, etc)

How would I go about this?


Solution

  • This sounds like the speaker enumeration feature in Azure Video Indexer https://learn.microsoft.com/en-us/azure/azure-video-indexer/video-indexer-overview#videoaudio-ai-features.