audiosignal-processingspeech-recognitionlibrosadiarization

How can I count the number of people speaks in an audio file


I'm working on an audio project. My goal is to count the number of people who spokes in an audio file. We can consider that we already removed the noise from that audio.(for example, if there are two people talking in the audio the program can return 2 if there are three people talking in that audio the program will return 3...). I don't need speech recognition; I just want to know how many people talks. What is the best way to solve this problem?


Solution

  • If I am correct you are looking for speaker diarization. In this thread someone listed a few options for python. Python Speaker Recognition

    Otherwise if you want to take the easier way, you can let google do it for you with their Cloud Speech-to-text API. Not free, but also really cool. More about that right here: https://cloud.google.com/speech-to-text/docs/multiple-voices