javaeclipsenlpporter-stemmerjaws-wordnet

PorterStemmer with verbs ending in -es and -ed java


i am using PorterStemmer in java to get the base form of a verb, But i found a problem with the verbs "goes" and "gambles". Instead of stemming it to "go" and "gamble", it stems them to "goe" and "gambl". Is there a better tool that can handle verbs that ends with -es and -ed to retrieve the base form of a verb? P.S JAWS with wordnet java does that too. Here is my code:

public class verb
{
    public static void main(String[] args)
    {
        PorterStemmer ps = new PorterStemmer();
        ps.setCurrent("gambles");
        ps.stem();
        System.out.println(ps.getCurrent());        
    }
}

Here is the output in console: gambl


Solution

  • Take a few minutes to read this tutorial of Stanford NLP group

    https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

    You can find the stemmer actually is not working as what you may think. It is crude so it not always gives you a complete base form of verbs with the ending chopped off. In your case, since you are caring about getting a complete base form of a verb, lemmatization seems better for you.