audiosimilarityverificationspeechspeaker

Which feature, algorithm is good for Speaker Verification


I have a task with speaker verification.

My task is calculate the similarity between two audio speech voice, then compare with a threshold. Ex: similarity score between two audio is 70%, threshold is 50%. Hence the speaker is the same person.

The speech is text-independent, it's can be any conversation.

I have experiment in using MFCC, GMM for speaker recognition task, but this task is difference, just compare two audio feature to have the similarity score. I don't know which feature is good for speaker verification and which algorithm can help me to calculate similarity score between 2 patterns.

Hope to have you guys's advices,

Many thanks.


Solution

  • State of the art these days is xvectors:

    Deep Neural Network Embeddings for Text-Independent Speaker Verification

    Implementation in Kaldi is here.