javanlpopennlpthai

OpenNLP gives error when using Thai model


I have tried to follow the advice from here, but I got this error:

C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin>opennlp TokenizerME C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin\thai.tok.bin < test.txt

Loading Tokenizer model ... Exception in thread "main" java.lang.NullPointerException
    at opennlp.tools.util.model.BaseModel.getManifestProperty(BaseModel.java:491)
    at opennlp.tools.util.model.BaseModel.initializeFactory(BaseModel.java:245)
    at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:237)
    at opennlp.tools.util.model.BaseModel.<init>(BaseModel.java:181)
    at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:125)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:39)
    at opennlp.tools.cmdline.tokenizer.TokenizerModelLoader.loadModel(TokenizerModelLoader.java:31)
    at opennlp.tools.cmdline.ModelLoader.load(ModelLoader.java:62)
    at opennlp.tools.cmdline.tokenizer.TokenizerMETool.run(TokenizerMETool.java:41)
    at opennlp.tools.cmdline.CLI.main(CLI.java:225)

The test.txt file contains the sentence "ผมหิวข้าว".

Could anyone tell me how to fix it? I want to use the POSTagger. Thank you.


Solution

  • I think you're missing the manifest.properties file. Can you unzip the thai.tok.bin file and check that it contains these files:

    1. token.model (binary tokenizer model)
    2. manifest.properties (configuration)

    Contents of manifest.properties should be like this, taken from the question you link to:

    Manifest-Version=1.0.
    Language=th
    OpenNLP-Version=1.5.0
    Component-Name=TokenizerME
    useAlphaNumericOptimization=false