optimizationspeech-recognitiongaussianhtk

HTK: Optimizing mixture-splitting phone by phone


I use HTK to train an acoustic model. My last step is splitting mixtures of the phone gaussians. Normally, I always split all phones (their inner states) in one step by one, then re-estimate and stop when the performance drops.

Now I want to try out splitting the phones one by one because this should lead to equal or better overall result. The way I do it is, try to split every phone, pick the one that led to the best result, keep it split, reset all others, and start over. This takes too long though. I thought of splitting all of those that brought an improvement, not just the best one, and then go to next iteration.

My question is: If splitting a phone lowers the performance, is there any point in trying to split it again at a later stage? Or can I just blacklist it and just try with those that brought an improvement in the last iteration?


Solution

  • Improvement from such schemes are usually tiny. You can get much better improvement simply moving to DNN (supported by HTK 3.5 by the way).

    If splitting a phone lowers the performance, is there any point in trying to split it again at a later stage? Or can I just blacklist it and just try with those that brought an improvement in the last iteration?

    You can blacklist