I am trying to investigate a way to fix (or alter) how spaCy identifies verbs/nouns. In the following example I would like to recognize finger
as a NOUN
not a VERB
.
import spacy
nlp = spacy.load("en_core_web_lg")
doc = nlp('over exertion to finger from pulling open a stuck door left middle finger strain')
for w in doc:
print(w.text, w.lemma_, w.pos_)
which returns
over over ADP
exertion exertion NOUN
to to PART
finger finger VERB <-- finger should be NOUN
from from ADP
pulling pull VERB
open open ADJ
a a DET
stuck stuck ADJ
door door NOUN
left leave VERB
middle middle ADJ
finger finger NOUN
strain strain NOUN
What changes could I make to solve this issue?
Use a better, en_core_web_trf
, model:
>>> import spacy
>>> nlp = spacy.load("en_core_web_trf")
>>> doc = nlp('over exertion to finger from pulling open a stuck door left middle finger strain')
>>> for w in doc:
print(w.text, w.lemma_, w.pos_)
over over ADP
exertion exertion NOUN
to to ADP
finger finger NOUN
from from ADP
pulling pull VERB
open open ADP
a a DET
stuck stick VERB
door door NOUN
left leave VERB
middle middle ADJ
finger finger NOUN
strain strain NOUN