nlptagsnltknltk-trainer

NLTK MaxentClassifier train with negative cases


I am new at nltk library and I try to teach my classifier some labels with my own corpus.

For this I have a file with IOB tags like this :

How O 
do B-MYTag
you I-MYTag
know O
, O
where B-MYTag
to O
park O
? O

I do this by:

self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0)

and it works.

How to train my classifier with negative cases?

I would have similar file with IOB tags, and I would specified that this file is set wrong. (Negative weights)

How can I do this?

Example for negative case would be:

How B-MYTag 
do O
you O
know O
, O
where B-MYTag
to O
park O
? O

After that, I expect to remember that How is probably not a MYTag... The reason for this is, classifier to learn faster.

If I could just type the statements, program would process it and at the end ask me if I am satisfied with the result. If I am, this text would be added to the train_set, if not it would be added to the negative_train_set.

This way, it would be easier and faster to teach classifier the right stuff.


Solution

  • I'm guessing that you tried a classifier, saw some errors in the results, and want to feed back the wrong outputs as additional training input. There are learning algorithms that optimize on the basis of which answers are wrong or right (neural nets, Brill rules), but the MaxEnt classifier is not one of them. Classifiers that do work like this do all the work internally: They tag the training data, compare the result to the gold standard, adjust their weights or rules accordingly, and repeat again and again.

    In short: You can't use incorrect outputs as a training dataset. The idea doesn't even fit the machine learning model, since training data is by assumption correct so incorrect inputs have probability zero. Focus on improving your classifier by using better features, more data, or a different engine.