gstreamerkaldi

Assert in Kaldi when used with GStreamer


Using GStreamer plugin from Alumae and the following pipeline :

appsrc source='appsrc' ! wavparse ! audioconvert ! audioresample ! queue ! kaldinnet2onlinedecoder <parameters snipped> ! filesink location=/tmp/test

I always get the following assert that I don't understand KALDI_ASSERT(current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim);

What is this assert error about ? How to fix it ?

FYI, the data pushed into the pipeline come from a streamed wav file and replacing kaldinnetonlinedecoder with wavenc correctly generate a Wav file instead of a text file at the end.

EDIT Here are the parameters used:

use-threaded-decoder=0   
model=/opt/en/final.mdl   
word-syms=<word-file>  
fst=<fst_file>
mfcc-config=<mfcc-file>  
ivector-extraction-config=/opt/en/ivector-extraction/ivector_extractor.conf  
max-active=10000  
beam=10.0  
lattice-beam=6.0  
do-endpointing=1  
endpoint-silence-phones=\"1:2:3:4:5:6:7:8:9:10\"  
traceback-period-in-secs=0.25  
num-nbest=10  

For your information, using the pipeline textual representation in python works but coding it (i.e using Gst.Element_Factory.make and so on) always throw the exception

SECOND UPDATE Here is the full stack trace generated by the assert

ASSERTION_FAILED ([5.2]:AdvanceChunk():decodable-online-looped.cc:223) : 'current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim'

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::EnsureFrameIsComputed(int)
kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()

Solution

  • I finally got it working, even with frame-subsampling-factor parameter.

    The problem resides in the order of the parameters. fst and model parameters have to be the last ones.

    Thus the following textual chain works :

    gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! \
               audioconvert ! \
               audioresample ! tee name=t ! queue ! \
           kaldinnet2onlinedecoder \
           use-threaded-decoder=0 \
           nnet-mode=3 \
           word-syms=/opt/models/fr/words.txt \
           mfcc-config=/opt/models/fr/mfcc_hires.conf \
           ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf \
           phone-syms=/opt/models/fr/phones.txt \
           frame-subsampling-factor=3 \
           max-active=7000 \
           beam=13.0 \
           lattice-beam=8.0 \
           acoustic-scale=1 \
           do-endpointing=1 \
           endpoint-silence-phones=1:2:3:4:5:16:17:18:19:20 \
           traceback-period-in-secs=0.25 \
           num-nbest=2 \
           chunk-length-in-secs=0.25 \
           fst=/opt/models/fr/HCLG.fst \
           model=/opt/models/fr/final.mdl \
           ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0
    

    I opened an issue on GitHub for this as for me, this can be really difficult to find and should at least be documented.