Hi, I have been using freeling for a few months now to extract triplets. So far I have succeded in doing so by using the dependency tree and the full parse tree, but I am trying to add NERC.
I checked the tutorial for python, but I couldn't find anything beyond depdency parsing. So I went through the class list (since the same classes should be available for python and c++) but it is not very clear how to retrieve the named entities and after checking the output of the analyzer sampler I have a few questions about the performance of the NER module.
So what I'm asking if anyone can help me with is the following:
neclass = pyfreeling.ner(lpath + "/nerc/ner/ner-ab-rich.dat")
Any comments and suggestions are welcomed, thanks in advance.
Well aparently there are 3 NERC modules, one rule-based and two ML-based. All of them use capitalization as a feature, and since both models are trained on standard text, all NEs seen in training are capitalized. Therefore lowercase named entities are not likely to be recognized.
About the retrieval it seems that the get_label() from the nodes can provide this info if a word (or multiword) has a pos-tag starting with "NP", then it means it was recognized by the NERC module.
This is based on freelings authors own explanation which you can find here