Ideally what I am looking for is a way to get a vector of probability that a particular segment of an audio file is a certain phone. Something like:
input:
output:
You can obtain the scores running HVite
in forced alignment mode. I am afraid you have to run this for every phoneme you have:
HVite -A -D -T 1 -l '*' -o NTW -C HTK.cfg -a \
-H macros \
-H hmmdefs \
-i acoustic_score_AA.mlf \
-y lab \
-I AA.mlf \
-S index.scp \
words phones
The output file acoustic_score_AA.mlf
will contain the result. I
The contents of words
vocabulary file should be like:
AA AA
AE AE
....
ZH ZH
and the phones
has to contain the list of the phonemes (HMM models), as far as I remember.
The trick here is the content of the input .mlf file. For instance, AA.mlf
should be like:
#!MLF!#
"*/S0001.lab"
AA
.
This will force HVite to apply the AA
model for the whole utterance. Chunking of the audio file has to be performed in advance.