I'm trying to create a speaker recognition machine learning.
Currently i'm using the following scheme:
I usually get about 85% recognition rate for 3 speakers which is not amazing and therefore I decided that I want to add some features, but I don't know what to add...
Someone has a recommendations to what feature should I add/ what should I do in order to increase my precentage?
I tried to use a module that call - "pitch" which give me the pitch of a wav file but it gave me very randomic values ( for example for the same speaker it gave me 360, 80, 440 for the 3 first audios )
Thanks alot for any help
You should be processing longer chunks at once, in 0.15 seconds is almost impossible identify speaker identity.
The general rule is the longer audio you process, the more accurate recognition you will have. Something like 1-3 seconds is good and you need to input them to neural network as a whole.
You can google for x-vector on github, there are many implementation, you can find one in kaldi for example.