javatwitterstanford-nlptext-classificationmaxent

Saving and Loading Trained Stanford classifier in java


I have a dataset of 1 million labelled sentences and using it for finding sentiment through Maximum Entropy. I am using Stanford Classifier for the same:-

public class MaximumEntropy {

static ColumnDataClassifier cdc;

public static float calMaxEntropySentiment(String text) {
    initializeProperties();
    float sentiment = (getMaxEntropySentiment(text));
    return sentiment;
}

public static void initializeProperties() {
    cdc = new ColumnDataClassifier(
            "\\stanford-classifier-2016-10-31\\properties.prop");
}

public static int getMaxEntropySentiment(String tweet) {

    String filteredTweet = TwitterUtils.filterTweet(tweet);
    System.out.println("Reading training file");
    Classifier<String, String> cl = cdc.makeClassifier(cdc.readTrainingExamples(
            "\\stanford-classifier-2016-10-31\\labelled_sentences.txt"));

    Datum<String, String> d = cdc.makeDatumFromLine(filteredTweet);
    System.out.println(filteredTweet + "  ==>  " + cl.classOf(d) + " " + cl.scoresOf(d));
    // System.out.println("Class score is: " +
    // cl.scoresOf(d).getCount(cl.classOf(d)));
    if (cl.classOf(d) == "0") {
        return 0;
    } else {
        return 4;
    }
}
}

My data is labelled 0 or 1. Now for each tweet the whole dataset is being read and it is taking a lot of time considering the size of dataset. My query is that is there any way to first train the classifier and then load it when a tweet's sentiment is to be found. I think this approach will take less time. Correct me if I am wrong. The following link provides this but there is nothing for JAVA API. Saving and Loading Classifier Any help would be appreciated.


Solution

  • Yes; the easiest way to do this is using Java's default serialization mechanism to serialize a classifier. A useful helper here is the IOUtils class:

    IOUtils.writeObjectToFile(classifier, "/path/to/file");
    

    To read the classifier:

    Classifier<String, String> cl = IOUtils.readObjectFromFile(new File("/path/to/file");