I am performing NER using sklearn-crfsuite. I am trying to report back on an entity mention by entity mention case as a true positive (both prediction and expected correct even if no entity), false positive (prediction says yes, expected no) or false negative (prediction says no, expected yes).
I cannot see how to get anything other than tag/token based summary statistics for NER performance.
I would be OK with a different way of grouping entity mentions such as: correct, incorrect, partial, missing, spurious. I can write a whole bunch of code around it myself to try to accomplish this (and might have to), but there has to be a single call to get this info?
Here are some of the calls that are being made to get the summary statistics:
from sklearn import metrics
report = metrics.classification_report(targets, predictions,
output_dict=output_dict)
precision = metrics.precision_score(targets, predictions,
average='weighted')
f1 = metrics.f1_score(targets, predictions, average='weighted')
accuracy = metrics.accuracy_score(targets, predictions)
It's not so straightforward to get the metrics you mentioned (i.e., correct, incorrect, partial, missing, spurious) which I believe are the same ones as SemEval'13 challenge introduced.
I also needed to report some results based on these metrics and ended up coding it myself:
I'm working together with someone else and we are planning to release that as package that can be easily integrated with open-source NER systems and/or read standard formats like CoNLL. Feel free to join and help us out :)