nlppytorchnamed-entity-recognitionnamed-entity-extractionbert-language-model

How to use BERT just for ENTITY extraction from a Sequence without classification in the NER task?


My requirement here is given a sentence(sequence), I would like to just extract the entities present in the sequence without classifying them to a type in the NER task. I see that BertForTokenClassification for NER does the classification. Can this be adapted for just the extraction?

Can BERT just be used to do entity extraction/identification?


Solution

  • Regardless BERT, NER tagging is usually done by tagging with the IOB format (inside, outside, beginning) or something similar (often the end is also explicitly tagged). The inside and beggining tags contain the entity type. Something like this:

    Alex B-PER
    is O
    going O
    to O
    Los B-LOC
    Angeles I-LOC
    

    If you modify your training data, such that there will be only one entity type, the model will only learn to detect the entities without knowing what type the entity is.

    Alex B
    is O
    going O
    to O
    Los B
    Angeles I