javastanford-nlppos-taggerfrench

IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes Stanford NLP


I'm trying to test the Hello word of Stanford POS tagger API in Java (I used the same .jar in python and it worked well) on french sentences. Here is my code

public class TextPreprocessor {

    private static MaxentTagger tagger=new MaxentTagger("../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger");

    public static void main(String[] args) {
        
        String taggedString = tagger.tagString("Salut à tous, je suis coincé");
        System.out.println(taggedString);
    }
}

But I get the following exception:

Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread "main" java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
    at edu.stanford.nlp.process.PTBLexer.<init>(PTBLexer.java)
    at edu.stanford.nlp.process.PTBTokenizer.<init>(PTBTokenizer.java:285)
    at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
    at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.<init>(DocumentPreprocessor.java:271)
    at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
    at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)

Can you help me?


Solution

  • You can use this code and the full CoreNLP package:

    package edu.stanford.nlp.examples;
    
    import edu.stanford.nlp.ling.*;
    import edu.stanford.nlp.pipeline.*;
    import edu.stanford.nlp.util.*;
    
    import java.util.*;
    
    
    public class PipelineExample {
    
      public static String text = "Paris est la capitale de la France.";
    
      public static void main(String[] args) {
        // set up pipeline properties
        Properties props = StringUtils.argsToProperties("-props", "french");
        // set the list of annotators to run
        props.setProperty("annotators", "tokenize,ssplit,mwt,pos");
        // build pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // create a document object
        CoreDocument document = pipeline.processToCoreDocument(text);
        // display tokens
        for (CoreLabel tok : document.tokens()) {
          System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
        }
      }
    
    }
    

    You can download CoreNLP here: https://stanfordnlp.github.io/CoreNLP/

    Make sure to download the latest French models.

    I am not sure why your example with the standalone tagger does not work. What jars were you using?