[SOLVED] How to compare spoken audio against reference recording

How to compare spoken audio against reference recording - language learning

I am looking for a way to compare a user submitted audio recording against a reference recording for comparison in order to give someone a grade or percentage for language learning.

I realize that this is a very un-scientific way of doing things and is more than a gimmick than anything.

My first thoughts are some sort of audio fingerprinting, or waveform comparison.

Any ideas where I should be looking?

Solution

This is by no means a trivial problem to solve, though there is an abundance of research on the topic. Presently the most successful forms of machine learning in the speech recognition domain apply Hidden Markov Model techniques.

You may also want to take a look at existing implementations of HMM algorithms. One such library in its early stages is ghmm.

Perhaps even better and more readily applicable to your problem is HTK.