i need to find the opinion of certain reviews given in websites. i am using sentiwordnet for this. i first send the file containing all the reviews to POS Tagger.
tokens=nltk.word_tokenize(line) #tokenization for line in file
tagged=nltk.pos_tag(tokens) #for POSTagging
print tagged
Is there any other accurate way of tokenizing which considers not good as 1 word other than considering it as 2 separate words.
Now i have to give postive and negative score to the tokenized words and then calculate the total score. Is there any function in sentiwordnet for this. please help.
See First Extract Adverbs and Adjectives from review for example:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
import csv
para = "What can I say about this place. The staff of the restaurant is nice and the eggplant is not bad. Apart from that, very uninspired food, lack of atmosphere and too expensive. I am a staunch vegetarian and was sorely dissapointed with the veggie options on the menu. Will be the last time I visit, I recommend others to avoid"
sentense = word_tokenize(para)
word_features = []
for i,j in nltk.pos_tag(sentense):
if j in ['JJ', 'JJR', 'JJS', 'RB', 'RBR', 'RBS']:
word_features.append(i)
rating = 0
for i in word_features:
with open('words.txt', 'rt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
if i == row[0]:
print i, row[1]
if row[1] == 'pos':
rating = rating + 1
elif row[1] == 'neg':
rating = rating - 1
print rating
Now you must have a external csv file in which you should have positive and negative words
like : wrinkle,neg wrinkled,neg wrinkles,neg masterfully,pos masterpiece,pos masterpieces,pos
Working of the above script as follows:
1 . read sentence 2 . extract adverb and adjectives 3 . compare to CVS for positive and negative words 4 . and then rate the sentence
Result of above script is :
nice pos
bad neg
expensive neg
sorely neg
-2
change result as per your need. and sorry for my english :P