pythonnltkcounterpart-of-speech

Counter to return null-value if Part of Speech tag not present


Currently i am trying to count the instances a certain part of speech occurs in a given online review. While i am able to retrieve the specific tags corresponding to each word, and count these instances, i face difficulties in also capturing the null-values (if the tag is not present = 0). Ideally, i would have a list of all the tags with either the count of actual occurrences in the review, or if it is not present = 0. I use NLTK's POS tagger.

The following code will get me the specific tags per review, but thus only specifically to the tokens in the review:

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
counts=Counter(tag for word,tag in tagged)
postag.append(counts)

I tried to make a separate list with some specific tags (goal was to achieve all the verbs and nouns) but it still only returns only those with actual values (1 or more) and not those with 0 (not present in text). I could potentially insert all the available tags in there however it would thus only return actual values. For instance:

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
selective_tagged_words =[]
for word,tag in tagged:
    if tag in selective_tagged:
        selective_tagged_words.append((word,tag))
counts=Counter(tag for word,tag in selective_tagged_words)
postag.append(counts) 

So in the above example output would be:

Counter({'NNS': 3, 'VBP': 3, 'VBN': 1, 'NN': 5, 'VBZ': 1, 'VB': 4, 'NNP': 1})

But i want

Counter({'NNS': 3, 'VBP': 3, 'VBN': 1, 'NN': 5, 'VBZ': 1, 'VB': 4, 'NNP': 1, 'NNPS': 0, 'VBD': 0})

Thanks for the help!

Edit 2: Code that worked in the end (thanks to manoj yadav):

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
selective_tagged_words =[]
for word,tag in tagged:
    if tag in selective_tagged:
        selective_tagged_words.append((word,tag))
counts=Counter(tag for word,tag in selective_tagged_words)
other_tags = set(selective_tagged)-set(counts)
for i in other_tags:
    counts[i]=0
postag.append(counts)

Solution

  • for line in lines:
        tokens = nltk.word_tokenize(line)
        tagged = nltk.pos_tag(tokens)
        selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
        selective_tagged_words = []
        for word, tag in tagged:
            if tag in selective_tagged:
                selective_tagged_words.append((word, tag))
        count = Counter(tag for word, tag in selective_tagged_words)
    
        other_tags = set(selective_tagged)-set(count)
        for i in other_tags:
            count[i]=0
        postag.append(count)
    print(postag)
    

    try if this works