I have a arpa
file which I created by the following command:
./lmplz -o 4 -S 1G <tmp_100M.txt >100m.arpa
Now I want to convert this arpa
file to binary file:
./build_binary 100m.arpa 100m.bin
And I'm getting error:
mmap.cc:225 in void util::HugeMalloc(std::size_t, bool, util::scoped_memory&) threw ErrnoException because `!to.get()'.
Cannot allocate memory Failed to allocate 106122412848 bytes Byte: 80
ERROR
I tried to add -S
parameter:
./build_binary -S 1G 100m.arpa 100m.bin
and I got the same error.
How can I convert to binary file ?
Why I'm getting this error ?
Take a look at https://aclanthology.org/W16-4618 for some light explanation
Try this instead:
LM_ORDER=4
CORPUS_LM="tmp_100M"
LANG_E="txt"
LM_ARPA="100m.arpa"
LM_FILE="100m.bin"
${MOSES_BIN_DIR}/lmplz --order ${LM_ORDER} -S 80% -T /tmp \
< ${CORPUS_LM}.${LANG_E} | gzip > ${LM_ARPA}
${MOSES_BIN_DIR}/build_binary trie -a 22 -b 8 -q 8 ${LM_ARPA} ${LM_FILE}
MOSES_BIN_DIR
is the directory where the binaries you've compiled are stored.
If you still face the memory issue when using the trie and quantization options, you might need to change to a machine/instance where the CPU RAM is sufficient to read your language model and produce the binary.