I am using WordNet and NLTK for the word sense disambiguation. I am interested in all the words, which are related to the sound. I have a list of such words and 'roll' is one of them. Then I check if any of my sentences contains this word (I also check it depending on the POS). And if yes I would like to select only such sentences, which are related to sound. In the example below it would be the second sentence. The idea I have now is just to select such words, whos definition has a word 'sound' in it as 'the sound of a drum (especially a snare drum) beaten rapidly and continuously'. But I suspect that there is a more elegant way. Any ideas would be highly appreciated!
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]
word = 'roll'
for sentence, pos_tag in samples:
word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
print 'Sentence:', sentence
print 'Word synset:', word_syn
print 'Corresponding definition:', word_syn.definition()
output:
Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously
You could use WordNet hypernyms (synsets with a more general meaning). My first idea would be to go from the current synset upwards (using synset.hypernyms()
) and keep checking whether I find the "sound" synset. When I hit the root (which has no hypernyms, i.e. synset.hypernyms()
returns an empty list), I would stop.
Now for your two examples, this produces the following sequences of synsets:
Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
So one of the synsets you might want to look for is sound.n.04
. But there could be others, I think you could play with other examples and try to come up with a list.