spacynamed-entity-recognitionnamed-entity-extraction

Reference other entities in entity_ruler


I'm tring to build a custom list of "named entities" using the entity_ruler, following also the APIs

However I'm facing a problem: can I build a named entity that reference another one also defined in the entity_ruler?

To make an example, let's say I want to build the entity Agreement as some fixed expressions, and the entity AgreementDate as an Agreement followed by another expression: can the following snipped correctly set spacy? Because the output is not what I was expecting.

patterns = [
    {'label': 'Agreement', 'pattern': [{'LOWER': 'license agreement'}]},
    {'label': 'Agreement', 'pattern': [{'LOWER': 'agreement'}]},
    {'label': 'Agreement', 'pattern': [{'LOWER': 'commencement'}]},
    {'label': 'Agreement', 'pattern': [{'LOWER': 'parties'}]},
    {'label': 'AgreementDate', 'pattern': [{'ENT_TYPE': 'Agreement'}, {'LOWER': 'date'}]},
]
nlp = spacy.load('en_core_web_sm')
entity_ruler = nlp.add_pipe('entity_ruler', config={
    'validate': True,
    'overwrite_ents': True
})
entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)
for ent in nlp('''Commencement Date
license agreement date''').ents:
    print(f'{ent.text:40} {ent.label_:40}')
Commencement                             Agreement                               
agreement                                Agreement                               

Solution

  • The entity ruler patterns only match against the annotation that is set before the entity ruler component starts running, but you can do this if you move the final pattern into a second entity ruler (use a custom component name).