pythonnltkwordnetword-sense-disambiguation

Word sense disambiguation with WordNet. How to select the words related to the same meaning?


I am using WordNet and NLTK for the word sense disambiguation. I am interested in all the words, which are related to the sound. I have a list of such words and 'roll' is one of them. Then I check if any of my sentences contains this word (I also check it depending on the POS). And if yes I would like to select only such sentences, which are related to sound. In the example below it would be the second sentence. The idea I have now is just to select such words, whos definition has a word 'sound' in it as 'the sound of a drum (especially a snare drum) beaten rapidly and continuously'. But I suspect that there is a more elegant way. Any ideas would be highly appreciated!

from nltk.wsd import lesk
from nltk.corpus import wordnet as wn

samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]

word = 'roll'
for sentence, pos_tag in samples:
    word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
    print 'Sentence:', sentence
    print 'Word synset:', word_syn
    print  'Corresponding definition:', word_syn.definition()

output:

Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously

Solution

  • You could use WordNet hypernyms (synsets with a more general meaning). My first idea would be to go from the current synset upwards (using synset.hypernyms()) and keep checking whether I find the "sound" synset. When I hit the root (which has no hypernyms, i.e. synset.hypernyms() returns an empty list), I would stop.

    Now for your two examples, this produces the following sequences of synsets:

    Sentence:The van rolled along the highway .
    Word synset:Synset('scroll.n.02')
    [Synset('manuscript.n.02')]
    [Synset('autograph.n.01')]
    [Synset('writing.n.02')]
    [Synset('written_communication.n.01')]
    [Synset('communication.n.02')]
    [Synset('abstraction.n.06')]
    [Synset('entity.n.01')]
    
    Sentence:The thunder rolled and the lightning striked .
    Word synset:Synset('paradiddle.n.01')
    [Synset('sound.n.04')]
    [Synset('happening.n.01')]
    [Synset('event.n.01')]
    [Synset('psychological_feature.n.01')]
    [Synset('abstraction.n.06')]
    [Synset('entity.n.01')]
    

    So one of the synsets you might want to look for is sound.n.04. But there could be others, I think you could play with other examples and try to come up with a list.