pythonnltkwordnet

Using WordNet with nltk to find synonyms that make sense


I want to input a sentence, and output a sentence with hard words made simpler.

I'm using Nltk to tokenize sentences and tag words, but I'm having trouble using WordNet to find a synonym for the specific meaning of a word that I want.

For example:

Input: "I refuse to pick up the refuse"

Maybe refuse #1 is the easiest word for rejecting, but the refuse #2 means garbage, and there are simpler words that could go there.

Nltk might be able to tag refuse #2 as a noun, but then how do I get synonyms for refuse (trash) from WordNet?


Solution

  • Sounds like you want word synonyms based upon the part of speech of the word (i.e. noun, verb, etc.)

    Follows creates synonyms for each word in a sentence based upon part of speech. References:

    1. Extract Word from Synset using Wordnet in NLTK 3.0
    2. Printing the part of speech along with the synonyms of the word

    Code

    import nltk; nltk.download('popular') 
    from nltk.corpus import wordnet as wn
    
    def get_synonyms(word, pos):
      ' Gets word synonyms for part of speech '
      for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
        for lemma in synset.lemmas():
            yield lemma.name()
    
    def pos_to_wordnet_pos(penntag, returnNone=False):
       ' Mapping from POS tag word wordnet pos tag '
        morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
                      'VB':wn.VERB, 'RB':wn.ADV}
        try:
            return morphy_tag[penntag[:2]]
        except:
            return None if returnNone else ''
    

    Example Usage

    # Tokenize text
    text = nltk.word_tokenize("I refuse to pick up the refuse")
    
    for word, tag in nltk.pos_tag(text):
      print(f'word is {word}, POS is {tag}')
    
      # Filter for unique synonyms not equal to word and sort.
      unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))
    
      for synonym in unique:
        print('\t', synonym)
    

    Output

    Note the different sets of synonyms for refuse based upon POS.

    word is I, POS is PRP
    word is refuse, POS is VBP
         decline
         defy
         deny
         pass_up
         reject
         resist
         turn_away
         turn_down
    word is to, POS is TO
    word is pick, POS is VB
         beak
         blame
         break_up
         clean
         cull
         find_fault
         foot
         nibble
         peck
         piece
         pluck
         plunk
    word is up, POS is RP
    word is the, POS is DT
    word is refuse, POS is NN
         food_waste
         garbage
         scraps