[SOLVED] Is there a way to get the monophone probability using HTK?

Is there a way to get the monophone probability using HTK?

Ideally what I am looking for is a way to get a vector of probability that a particular segment of an audio file is a certain phone. Something like:

input:

wavfile
start position (e.g. @1.4 sec)
duration (e.g. 500 ms)

output:

SIL 2.324*10^-3
AA 1.514*10^-4
AE 1.482*10^-2
...
ZH 5.03*10^-5

Solution

You can obtain the scores running HVite in forced alignment mode. I am afraid you have to run this for every phoneme you have:

HVite -A -D -T 1 -l '*' -o NTW -C HTK.cfg -a \
    -H macros \
    -H hmmdefs \
    -i acoustic_score_AA.mlf \
    -y lab \
    -I AA.mlf \
    -S index.scp \
    words phones

The output file acoustic_score_AA.mlf will contain the result. I

The contents of words vocabulary file should be like:

AA AA
AE AE
....
ZH ZH

and the phones has to contain the list of the phonemes (HMM models), as far as I remember.

The trick here is the content of the input .mlf file. For instance, AA.mlf should be like:

#!MLF!#
"*/S0001.lab"
AA
.

This will force HVite to apply the AA model for the whole utterance. Chunking of the audio file has to be performed in advance.