I'm trying to find a synsets of words. Here's my codes:
from nltk.corpus import wordnet as wn
from nltk import pos_tag
def getSynonyms(word1):
synonymList1 = []
for data1 in word1:
wordnetSynset1 = wn.synsets(data1)
tempList1=[]
for synset1 in wordnetSynset1:
synLemmas = synset1.lemma_names()
for i in xrange(len(synLemmas)):
word = synLemmas[i].replace('_',' ')
tempList1.append(pos_tag(word.split()))
synonymList1.append(tempList1)
return synonymList1
word1 = ['study']
syn1 = getSynonyms(word1)
print syn1
and here's the output :
[[[(u'survey', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'work', 'NN')], [(u'report', 'NN')], [(u'study', 'NN')], [(u'written', 'VBN'), (u'report', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'discipline', 'NN')], [(u'subject', 'NN')], [(u'subject', 'JJ'), (u'area', 'NN')], [(u'subject', 'JJ'), (u'field', 'NN')], [(u'field', 'NN')], [(u'field', 'NN'), (u'of', 'IN'), (u'study', 'NN')], [(u'study', 'NN')], [(u'bailiwick', 'NN')], [(u'sketch', 'NN')], [(u'study', 'NN')], [(u'cogitation', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'analyze', 'NN')], [(u'analyse', 'NN')], [(u'study', 'NN')], [(u'examine', 'NN')], [(u'canvass', 'NN')], [(u'canvas', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'consider', 'VB')], [(u'learn', 'NN')], [(u'study', 'NN')], [(u'read', 'NN')], [(u'take', 'VB')], [(u'study', 'NN')], [(u'hit', 'VB'), (u'the', 'DT'), (u'books', 'NNS')], [(u'study', 'NN')], [(u'meditate', 'NN')], [(u'contemplate', 'NN')]]]
as we can see, 'study','NN'
appears more than once
how to print only once for each synonyms without repitition?
so each synonyms represented with only one synonym
Instead of always appending to the list you have inside the for loop, in the line tempList1.append(pos_tag(word.split()))
. You should check if the element you are trying to add is there in the list already. Having a simple if statement check should do it.
if pos_tag(word.split()) not in tempList1:
tempList1.append(pos_tag(word.split()))
This was an element will not be added twice.