I have the following sentence:
text="The weather is extremely severe in England"
I want to perform a custom Name Entity Recognition (NER)
procedure
First a normal NER
procedure will output England
with a GPE
label
pip install spacy
!python -m spacy download en_core_web_lg
import spacy
nlp = spacy.load('en_core_web_lg')
doc = nlp(text)
for ent in doc.ents:
print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))
Result: England - GPE - Countries, cities, states
However, I want the whole sentence to take the tag High-Severity
.
So I am doing the following procedure:
from spacy.strings import StringStore
new_hash = StringStore([u'High_Severity']) # <-- match id
nlp.vocab.strings.add('High_Severity')
from spacy.tokens import Span
# Get the hash value of the ORG entity label
High_Severity = doc.vocab.strings[u'High_Severity']
# Create a Span for the new entity
new_ent = Span(doc, 0, 7, label=High_Severity)
# Add the entity to the existing Doc object
doc.ents = list(doc.ents) + [new_ent]
I am taking the following error:
ValueError: [E1010] Unable to set entity information for token 6 which is included in more than one span in entities, blocked, missing or outside.
From my understanding, this is happening because NER
has already recognised England
as GRE
and cannot add a label over the existing label.
I tried to execute the custom NER
code (i.e, without first running the normal NER
code) but this did not solve my problem.
Any ideas on how to Solve this problem?
Indeed it looks like NER do not allow overlapping, and that is your problem, your second part of the code tries to create a ner containing another ner, hence, it fails. see in:
https://github.com/explosion/spaCy/discussions/10885
and therefore spacy has spans categorization.
I did not find yet the way to characterized a predefined span (not coming from a trained model)