I have a task with speaker verification.
My task is calculate the similarity between two audio speech voice, then compare with a threshold. Ex: similarity score between two audio is 70%, threshold is 50%. Hence the speaker is the same person.
The speech is text-independent, it's can be any conversation.
I have experiment in using MFCC, GMM for speaker recognition task, but this task is difference, just compare two audio feature to have the similarity score. I don't know which feature is good for speaker verification and which algorithm can help me to calculate similarity score between 2 patterns.
Hope to have you guys's advices,
Many thanks.
State of the art these days is xvectors:
Deep Neural Network Embeddings for Text-Independent Speaker Verification
Implementation in Kaldi is here.