I'm trying to find out which nouns exist in a sentence, i'm using pos_tag from nltk but it's not working very well here is my code/function
def Noun(sentence):
lista=[]
words=(word_tokenize(sentence))
pos=pos_tag(words)
for i in range(len(pos)):
if((pos[i][1].startswith('N'))):
lista.append(pos[i][0])
else:
pass
return pos,lista
for example : tweet="let's talk to Thomas and check if he will come to the party" Noun(tweet) expected :
output: ['Thomas','party']
what i got:
['let', 'talk', 'Thomas', 'party'])
There is no problem in your code. The algorithm "pos_tag" using is the reason for the wrong output. It shows those four word as noun:
[('let', 'NN'), ("'s", 'POS'), ('talk', 'NN'), ('to', 'TO'), ('Thomas', 'NNP'), ('and', 'CC'), ('check', 'VB'), ('if', 'IN'), ('he', 'PRP'), ('will', 'MD'), ('come', 'VB'), ('to', 'TO'), ('the', 'DT'), ('party', 'NN'), ('.', '.')]
You can try unigram tagging, n-gram tagging etc. Follow this link for detailed info: https://www.nltk.org/book/ch05.html