pythonnlpchatbotlemmatizationspacy-3

ModuleNotFoundError in spacy version 3.3.1 tried previous mentioned solution not working


  import spacy

     from spacy.lemmatizer import Lemmatizer

     from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES

     lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)

     lemmatizer('chuckles', 'NOUN')

the output should be chuckle. using version 3.1.1


Solution

  • It looks like they've changed the way the lemmatizer is instantiated but the following should work...

    import spacy
    nlp = spacy.load('en_core_web_sm', disable=['ner', 'tagger', 'parser', 'lemmatizer'])
    lemmatizer = nlp.get_pipe('lemmatizer')
    t = nlp('chuckles')[0]  
    t.pos_ = 'NOUN'
    lemma = lemmatizer.lemmatize(t)[0]
    print(lemma)
    # >> chuckle
    
    

    It's unfortunate that you have to call the lemmatizer with a Token but looking at the code, I don't see a way to call it with (word, pos). I think you're stuck with calling the empty pipeline with a single word to get a Token then manually setting the pos_ before calling lemmatize(t).

    Note that the POS tagger will not work correctly on a single word. It only works in sentences and will probably always assign NOUN for pos if you only have one word. This is why I've disabled the pipeline and set t.pos_ manually.

    BTW.. if you only need to lemmatize, you might look at lemminflect which is simpler for single word and also more accurate.