[SOLVED] Stanford Classifier ColumnDataClassifier

Stanford Classifier ColumnDataClassifier

I am using the Maximum Entropy algorithm provided by the Stanford Classifier in order to perform a customized Named Entity Recognition. The output file provides 5 columns --> word \t ground-truth \t label \t P(clAnswer) \t P(goldAnswer))

Which is the difference between P(clAnswer) and P(goldAnswer) and how are these calculated?

Solution

P(clAnswer) is the probability the model gives the guess. P(goldAnswer) is the probability the model gives the true gold answer.

If you want to understand the algorithm behind the classifier you can find resources at this link: https://nlp.stanford.edu/software/classifier.shtml

I should note that it is standard to use the CRFClassifier to train NER models. There is exhaustive documentation here about training an NER model:

https://nlp.stanford.edu/software/crf-faq.html#a