speech-recognitionhtk

Is there a way to get the monophone probability using HTK?


Ideally what I am looking for is a way to get a vector of probability that a particular segment of an audio file is a certain phone. Something like:

input:

output:


Solution

  • You can obtain the scores running HVite in forced alignment mode. I am afraid you have to run this for every phoneme you have:

    HVite -A -D -T 1 -l '*' -o NTW -C HTK.cfg -a \
        -H macros \
        -H hmmdefs \
        -i acoustic_score_AA.mlf \
        -y lab \
        -I AA.mlf \
        -S index.scp \
        words phones
    

    The output file acoustic_score_AA.mlf will contain the result. I

    The contents of words vocabulary file should be like:

    AA AA
    AE AE
    ....
    ZH ZH
    

    and the phones has to contain the list of the phonemes (HMM models), as far as I remember.

    The trick here is the content of the input .mlf file. For instance, AA.mlf should be like:

    #!MLF!#
    "*/S0001.lab"
    AA
    .
    

    This will force HVite to apply the AA model for the whole utterance. Chunking of the audio file has to be performed in advance.