pythonnltktokenizekeywordpos-tagger

how to get only the nouns from a sentence


I'm trying to find out which nouns exist in a sentence, i'm using pos_tag from nltk but it's not working very well here is my code/function

def Noun(sentence):
    lista=[]
    words=(word_tokenize(sentence))
    pos=pos_tag(words)
    for i in range(len(pos)):
        if((pos[i][1].startswith('N'))):
            lista.append(pos[i][0])
        else:
            pass
    return pos,lista


for example : tweet="let's talk to Thomas and check if he will come to the party" Noun(tweet) expected :

output: ['Thomas','party']

what i got:

['let', 'talk', 'Thomas', 'party'])

Solution

  • There is no problem in your code. The algorithm "pos_tag" using is the reason for the wrong output. It shows those four word as noun:

    [('let', 'NN'), ("'s", 'POS'), ('talk', 'NN'), ('to', 'TO'), ('Thomas', 'NNP'), ('and', 'CC'), ('check', 'VB'), ('if', 'IN'), ('he', 'PRP'), ('will', 'MD'), ('come', 'VB'), ('to', 'TO'), ('the', 'DT'), ('party', 'NN'), ('.', '.')]
    

    You can try unigram tagging, n-gram tagging etc. Follow this link for detailed info: https://www.nltk.org/book/ch05.html