speech-recognitioncmusphinxsphinx4

How to get the timestamp of when a word was said using Sphinx


I am currently trying to get the timestamp of a word which has been detected using CMU Sphinx.

while ((result = recognizer.getResult()) != null) {
    for(WordResult w : result.getWords()){
        if(w.getWord() != Word.UNKNOWN){
            System.out.println(w.getTimeFrame().getStart());
            System.out.println(w.getWord() + " " + (w.getTimeFrame().getStart()/100)/60 + ":" + (w.getTimeFrame().getStart()/100 % 60));
        }
    }
}

Is the code that I currently have. I think that it is because of the sample/framerate not being 100 per second as set out in the logic above.

The code above is clearly not accurate as the whole file is only 8 minutes long and the frame to time calculator outputs the timestamps over one hour long?

Is there any way to get the timestamp from a WordResult or a way to find the sample/frame rate that Sphinx is using?

I have looked around online and not been able to find any documentation on the TimeFrame class.


Solution

  • As Nikolay Shmyrev mentioned here, it turns out the TimeFrame is in miliseconds. I had tried this previously, however as there were so many results I was thrown off and thought of it to be incorrect (which is just because the model needs tweaking, I believe).

    The corrected code would be:

    System.out.println(w.getWord() + " " + (w.getTimeFrame().getStart()/1000)/60 + ":" + (w.getTimeFrame().getStart()/1000 % 60));