pythongoogle-colaboratorymozilla-deepspeechkenlmmake-scorer

['kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', 'lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1


During the build of lm binay to create scorer doe deepspeech model I was getting the following error again and again

subprocess.CalledProcessError: Command '['/content/kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', '/content/lm_filtered.arpa', '/content/lm.binary']' returned non-zero exit status 1.

The command I was using is as below

!python /content/DeepSpeech/data/lm/generate_lm.py \
--input_txt /content/transcripts.txt \
--output_dir /content/scorer/ \
--top_k 50000 \
--kenlm_bins /content/kenlm/build/bin/ \
--arpa_order 5 --max_arpa_memory "95%" --arpa_prune "0|0|1" \
--binary_a_bits 255 --binary_q_bits 8 --binary_type trie

Solution

  • Following worked for me Go to

    DeepSpeech -> data -> lm -> generate_lm.py
    

    Now find following stack of code inside it

    subprocess.check_call(
            [
                os.path.join(args.kenlm_bins, "build_binary"),
                "-a",
                str(args.binary_a_bits),
                "-q",
                str(args.binary_q_bits),
                "-v",
                args.binary_type,
                filtered_path,
                binary_path,
            ]
    

    Tweak the code by adding "-s" flag in it as below

    subprocess.check_call(
        [
            os.path.join(args.kenlm_bins, "build_binary"),
            "-a",
            str(args.binary_a_bits),
            "-q",
            str(args.binary_q_bits),
            "-v",
            args.binary_type,
            filtered_path,
            binary_path,
            "-s"
        ]
    

    Now your command will run fine