Looking for a code that would process media file to "Who said what and when" in other words a "Speaker by speaker Segmentation" and what timing for each. Failing answers: doing any manual works to process the media file..thanks!
You can use speaker diarization from Kaldi, it is not easy to setup but results are great.
There are many other libraries too - LIUM, bob, etc.