speech-recognitionvoice-recognitiongoogle-cloud-speech

Google Cloud Speech: Distinguish Voices?


I am interested in writing a voice recognition application that is aware of multiple speakers. For example if Bill, Joe, and Jane are talking then the application could not only recognize sounds as text but also classify the results by speaker (say 0, 1 and 2... because obviously/hopefully Google has no means of linking voices to people).

I am hunting for speech recognition APIs that might do this, and Google Cloud Speech comes up as a top ranked API. I have looked through the API docs to see if such functionality is available, and have not found it.

My question is: does/will this functionality exist?

Note: Google's support page says their engineers sometimes answer these questions on SO, so it seems plausible someone might have an answer to the "will" part of the question.


Solution

  • I know of no current provider that does this as an inbuilt part of their Speech Recognition API.

    I've used Microsoft Cognitive Services - Speaker Recognition API for something similar, but the audio is provided to the API separately to use of their Speech Recognition API.

    Being able to combine the two would be useful.